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76 kDa. 32 kDa. and 50 kDa HELICOBACTER POLYPEPTIDES AND 
CORRESPONDING POLYNUCLEOTIDE MOLECULES 
5 The invention relates to Helicobacter polypeptides and 

corresponding polynucleotide molecules that can be used in methods to prevent 
or treat Helicobacter infection in mammals, such as humans. 

Background of the Invention 
Helicobacter is a genus of spiral, gram-negative bacteria that 

10 colonize the gastrointestinal tracts of mammals. Several species colonize the 
stomach, most notably H. pylori, H. heilmanii, H.felis, and H. mustelae. 
Although H. pylori is the species most commonly associated with human 
infection, H. heilmanii and H,felis have also been isolated from humans, but at 
lower frequencies than H. pylori. Helicobacter infects over 50% of adult 

15 populations in developed countries and nearly 100% in developing countries 
and some Pacific rim countries, making it one of the most prevalent infections 
worldwide. 

Helicobacter is routinely recovered from gastric biopsies of humans 
with histological evidence of gastritis and peptic ulceration. Indeed, H. pylori 

20 is now recognized as an important pathogen of humans, in that the chronic 
gastritis it causes is a risk factor for the development of peptic ulcer diseases 
and gastric carcinoma. It is thus highly desirable to develop safe and effective 
vaccines for preventing and treating Helicobacter infection. 

A number of Helicobacter antigens have been characterized -or 

25 isolated. These include urease, which is composed of two structural subunits of 
approximately 30 and 67 kDa (Hu et al. 9 Infect. Immun. 58:992, 1990; Dunn et 
ai, J. Biol. Chem. 265:9464, 1990; Evans etaL, Microbial Pathogenesis 10:15, 
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1991 ; Labigne et al, J. Bact., 173:1920, 1991); the 87 kDa vacuolar cytotoxin 
(VacA) (Cover et al, J. Biol. Chem. 267:10570, 1992; Phadnis et al., Infect. 
Immun. 62:1557, 1994; WO 93/18150); a 128 kDa immunodominant antigen 
associated with the cytotoxin (CagA, also called TagA; WO 93/1 8150; U.S. 

5 Patent No. 5,403,924); 13 and 58 kDa heat shock proteins HspA and HspB 
(Suerbaum et al., Mol. Microbiol. 14:959, 1994; WO 93/18150); a 54 kDa 
catalase (Hazell et al., J. Gen. Microbiol. 137:57, 1991); a 15 kDa histidine-rich 
protein (Hpn) (Gilbert et al., Infect. Immun. 63:2682, 1995); a 20 kDa 
membrane-associated lipoprotein (Kostrcynska et al., J. Bact. 176:5938, 1994); 

10 a 30 kDa outer membrane protein (Bolin et al., J. Clin. Microbiol. 33:381, 
1995); a lactoferrin receptor (FR 2,724,936); and several porins, designated 
HopA, HopB, HopC, HopD, and HopE, which have molecular weights of 
48-67 kDa (Exner et al., Infect. Immun. 63:1567, 1995; Doig et al, J. Bact. 
177:5447, 1995). Some of these proteins have been proposed as potential 

15 vaccine antigens. In particular, urease is believed to be a vaccine candidate 
(WO 94/9823; WO 95/22987; WO 95/3824; Michetti et al., Gastroenterology 
107:1002, 1994). Nevertheless, it is thought that several antigens may 
ultimately be necessary in a vaccine. 



Summary of the Invention 
20 The invention provides polynucleotide molecules that encode a 

family of 76 KDa Helicobacter polypeptides, designated GHPO 386, GHPO 
789, GHPO 1516, GHPO 1 197, GHPO 1180, GHPO 896, GHPO 711, GHPO 
190, GHPO 185, GHPO 1417, and GHPO 1414, a 32 kDa polypeptide, 
designated GHPO 1360, and a 50 kDa polypeptide, designated GHPO 750, 
25 which can be used, e.g., in methods to prevent, treat, or diagnose Helicobacter 
infection. The polypeptides include those having the amino acid sequences 
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shown in SEQ ID NOs:2-22 (even numbers), 66, and 68. Those skilled in the 
art will understand that the invention also includes polynucleotide molecules 
that encode mutants and derivatives of these polypeptides, which can result 
from the addition, deletion, or substitution of non-essential amino acids, as is 
5 described further below. 

In addition to the polynucleotide molecules described above, the 
invention includes the corresponding polypeptides (i.e., polypeptides encoded 
by the polynucleotide molecules of the invention, or fragments thereof), and 
monospecific antibodies that specifically bind to these polypeptides. 

10 The present invention has many applications and includes expression 

cassettes, vectors, and cells transformed or transfected with the polynucleotides 
of the invention. Accordingly, the present invention provides (i) methods for 
producing polypeptides of the invention in recombinant host systems and 
related expression cassettes, vectors, and transformed or transfected cells; (ii) 

15 live vaccine vectors, such as pox virus, Salmonella typhimurium, and Vibrio 
cholerae vectors, that contain polynucleotides of the invention (such vaccine 
vectors being useful in, e.g., methods for preventing or treating Helicobacter 
infection) in combination with a diluent or carrier, and related pharmaceutical 
compositions and associated therapeutic and/or prophylactic methods; (iii) 

20 therapeutic and/or prophylactic methods involving administration of 

polynucleotide molecules, either in a naked form or formulated with a delivery 
vehicle, polypeptides or mixtures of polypeptides, or monospecific antibodies 
of the invention, and related pharmaceutical compositions; (iv) methods for 
detecting the presence of Helicobacter in biological samples, which can- 

25 involve the use of polynucleotide molecules, monospecific antibodies, or 

polypeptides of the invention; and (v) methods for purifying polypeptides of 
the invention by antibody-based affinity chromatography. 
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Brief Description of the Drawings 
Figure 1 is an alignment of the predicted amino acid sequences of 
GHPO 386 (SEQ ID NO:2), GHPO 789 (SEQ ID NO:4), and GHPO 1516 
(SEQ ID NO:6), as well as a consensus sequence for the 76 kDa protein family. 
5 Figure 2 is an alignment of the predicted amino acid sequences of 

GHPO 1 197 (SEQ ID NO:8), GHPO 1180 (SEQ ID NO: 10), GHPO 896 (SEQ 
ID NO: 12), GHPO 71 1 (SEQ ID NO: 14), GHPO 190 (SEQ ID NO: 16), GHPO 
185 (SEQ IDNO:18), GHPO 1417 (SEQ IDNO:20), and GHPO 1414 (SEQ 
ID NO:22), as well as a consensus sequence for the 76 kDa protein family. 

10 Detailed Description 

Open reading frames (ORFs) encoding a family of new, full length, 
membrane-associated 76 kDa polypeptides, designated GHPO 386, GHPO 789, 
GHPO 1516, GHPO 1197, GHPO 1180, GHPO 896, GHPO 711, GHPO 190, 
GHPO 185, GHPO 1417, and GHPO 1414, a 32 kDa polypeptide, designated 

1 5 GHPO 1 360, and a 50 kDa polypeptide, designated GHPO 750, have been 
identified in the H. pylori genome. The amino acid sequences of the 76 kDa 
polypeptides are aligned in Figures 1 and 2. The 76 kDa, 32 kDa, and 50 kDa 
polypeptides can be used, for example, in vaccination methods for preventing 
or treating Helicobacter infection. For example, GHPO 750, GHPO 1360, 

20 GHPO 190, and GHPO 1516 have been shown to be protective antigens. By 
"protective antigen" is meant an antigen that is capable of reducing the 
infection level after challenge, relative to a positive control. Absolute 
protection from infection, although included in the invention, is not required. 

The polypeptides of the invention (except GHPO 750, see below) are 

25 secreted polypeptides that can be produced in their mature forms (i.e., as 
polypeptides that have been exported through class II or class III secretion 
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pathways) or as precursors that include a signal peptide, which can be removed 
in the course of excretion/secretion by cleavage at the N-terminal end of the 
mature form. (The cleavage site is located at the C-terminal end of the signal 
peptide, adjacent to the mature form.) The cleavage site for the polypeptides of 
5 the invention and, thus, the first amino acid of the mature polypeptides, was 
putatively determined. 

According to a first aspect of the invention, there are provided 
isolated polynucleotides that encode the precursor and mature forms of 
Helicobacter GffPO 386, GHPO 789, GHPO 1516, GHPO 1 197, GHPO 1 180, 
10 GHPO 896, GHPO 71 1, GHPO 190, GHPO 185, GHPO 1417, GHPO 1414, 
GHPO 1360, and GHPO 750. 

An isolated polynucleotide of the invention encodes: 

(i) a polypeptide having an amino acid sequence that is homologous 
to a Helicobacter amino acid sequence of a polypeptide associated with the 
15 Helicobacter membrane, the Helicobacter amino acid sequence being selected 
from the group consisting of the amino acid sequences shown: 

-in SEQ ID NO:2, beginning with an amino acid in any one of 
positions -19 to 5, preferably in position -19 or position 1, and ending with an 
amino acid in position 689 (GHPO 386); 
20 -in SEQ ID NO:4, beginning with an amino acid in any one of 

positions -20 to 5, preferably in position -20 or position 1, and ending with an 
amino acid in position 713 (GHPO 789); 

-in SEQ ID NO:6, beginning with an amino acid in any one of 
positions -20 to 5, preferably in position -20 or position 1, and ending with an 
25 amino acid in position 725 (GHPO 1516); 
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-in SEQ ID NO:8, beginning with an amino acid in any one of 
positions -20 to 5, preferably in position -20 or position 1, andending with an 
amino acid in position 691 (GHPO 1 197); 

-in SEQ ID NO: 10, beginning with an amino acid in any one of 
5 positions -20 to 5, preferably in position -20 or position 1, and ending with an 
amino acid in position 652 (GHPO 1 180); 

-in SEQ ID NO: 12, beginning with an amino acid in any one of 
positions -1 8 to 5, preferably in position -1 8 or position 1 , and ending with an 
amino acid in position 673 (GHPO 896); 
1 o -in SEQ ID NO: 14, beginning with an amino acid in any one of 

positions -21 to 5, preferably in position -21 or position 1, and ending with an 
amino acid in position 619 (GHPO 711); 

-in SEQ ID NO: 16, beginning with an amino acid in any one of 
positions -17 to 5, preferably in position -17 or position 1, and ending with an 
15 amino acid in position 635 (GHPO 190); 

-in SEQ ID NO: 1 8, beginning with an amino acid in any one of 
positions -19 to 5, preferably in position -19 or position 1, and ending with an 
amino acid in position 626 (GHPO 185); 

-in SEQ ID NO:20 3 beginning with an amino acid in any one of 
20 positions -16 to 5, preferably in position -16 or position 1, and ending with an 
amino acid in position 467 (GHPO 1417); 

-in SEQ ID NO:22, beginning with an amino acid in any one of 
positions -18 to 5, preferably in position -18 or position 1, and ending with an 
amino acid in position 673 (GHPO 1414); 
25 -in SEQ ID NO: 66, beginning with an amino acid in any one of 

positions -20 to 5, preferably in position -20 or position 1 , and ending with an 
amino acid in position 279 (GHPO 1360); and 
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-in SEQ ID NO:68, beginning with an amino acid in position 1 and 
ending with an amino acid in position 399 (GHPO 750); or 
(ii) a derivative of the polypeptide. 

The term "isolated polynucleotide" is defined as a polynucleotide 
5 that is removed from the environment in which it naturally occurs. For 
example, a naturally-occurring DNA molecule present in the genome of a 
living bacteria or as part of a gene bank is not isolated, but the same molecule, 
separated from the remaining part of the bacterial genome, as a result of, e.g., a 
cloning event (amplification), is "isolated. 77 Typically, an isolated DNA 
10 molecule is free from DNA regions (e.g., coding regions) with which it is 

immediately contiguous, at the 5* or 3' ends, in the naturally occurring genome. 
Such isolated polynucleotides can be part of a vector or a composition and still 
be isolated, as such a vector or composition is not part of its natural 
environment. 

15 A polynucleotide of the invention can consist of RNA or DNA (e.g., 

cDNA, genomic DNA, or synthetic DNA), or modifications or combinations of 
RNA or DNA. The polynucleotide can be double-stranded or single-stranded 
and, if single-stranded, can be the coding (sense) strand or the non-coding (anti- 
sense) strand. The sequences that encode polypeptides of the invention, as 

20 shown in SEQ ID NOs:2-22 (even numbers), 66, and 68, can be (a) the coding 
sequence as shown in SEQ ID NOs:l-21 (odd numbers), 65, and 67; (b) a 
ribonucleotide sequence derived by transcription of (a); or (c) a different 
coding sequence that, as a result of the redundancy or degeneracy of the genetic 
code, encodes the same polypeptides as the polynucleotide molecules having 

25 the sequences illustrated in any of SEQ ID NOs:l-21 (odd numbers), 65, and 
67. The polypeptides of the invention can be ones that are naturally secreted or 
excreted by, e.g., H.felis, H. mustelae, H. heilmanii, or H, pylori. 
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By "polypeptide" or "protein" is meant any chain of amino acids, 
regardless of length or post-translational modification (e.g., glycosylation or 
phosphorylation). Both terms are used interchangeably in the present 
application. 

5 By "homologous amino acid sequence" is meant an amino acid 

sequence that differs from an amino acid sequence shown in any of SEQ ID 
NOs:2-22 (even numbers), 66, and 68, or an amino acid sequence encoded by 
the nucleotide sequence of any of SEQ ID NOs: l-21 (odd numbers), 65, and 
67, by one or more non-conservative amino acid substitutions, deletions, or 

10 additions located at positions at which they do not destroy the specific 

antigenicity of the polypeptide. Preferably, such a sequence is at least 75%, 
more preferably at least 80%, and most preferably at least 90% identical to an 
amino acid sequence shown in any of SEQ ID NOs:2-22 (even numbers), 66, 
and 68. 

1 5 Homologous amino acid sequences include sequences that are 

identical or substantially identical to an amino acid sequence as shown in any 
of SEQ ID NOs:2-22 (even numbers), 66, and 68. By "amino acid sequence 
that is substantially identical" is meant a sequence that is at least 90%, 
preferably at least 95%, more preferably at least 97%, and most preferably at 

20 least 99% identical to an amino acid sequence of reference and that differs from 
the sequence of reference, if at all, by a majority of conservative amino acid 
substitutions. 

Conservative amino acid substitutions typically include substitutions 
among amino acids of the same class. These classes include, for example, 
25 amino acids having uncharged polar side chains, such as asparagine, glutamine, 
serine, threonine, and tyrosine; amino acids having basic side chains, such as 
lysine, arginine, and histidine; amino acids having acidic side chains, such as 
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aspartic acid and glutamic acid; and amino acids having nonpolar side chains, 
such as glycine, alanine, valine, leucine, isoleucine, proline, phenylalanine, 
methionine, tryptophan, and cysteine. 

Homology can be measured using sequence analysis software (e.g., 
5 Sequence Analysis Software Package of the Genetics Computer Group, 
University of Wisconsin Biotechnology Center, 1710 University Avenue, 
Madison, Wl 53705). Similar amino acid sequences are aligned to obtain the 
maximum degree of homology (i.e., identity). To this end, it may be necessary 
to artificially introduce gaps into the sequence. Once the optimal alignment has 

10 been set up, the degree of homology (i.e., identity) is established by recording 
all of the positions in which the amino acids of both sequences are identical, 
relative to the total number of positions. 

Homologous polynucleotide sequences are defined in a similar way. 
Preferably, a homologous sequence is one that is at least 45%, more preferably 

15 at least 60%, and most preferably at least 85% identical to a coding sequence of 
any of SEQ ID NOs:l-21 (odd numbers), 65, and 67. 

Polypeptides having a sequence homologous to one of the sequences 
shown in SEQ ID NOs:2-22 (even numbers), 66, and 68 include naturally- 
occurring allelic variants, as well as mutants or any other non-naturally 

20 occurring variants that are analogous in terms of antigenicity, to a polypeptide 
having a sequence as shown in SEQ ID NOs:2-22 (even numbers), 66, and 68. 

As is known in the art, an allelic variant is an alternate form of a 
polypeptide that is characterized as having a substitution, deletion, or addition 
of one or more amino acids that does not alter the biological function of the 

25 polypeptide. By "biological function" is meant a function of the polypeptide in 
the cells in which it naturally occurs, even if the function is not necessary for 
the growth or survival of the cells. For example, the biological function of a 
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porin is to allow the entry into cells of compounds present in the extracellular 
medium. The biological function is distinct from the antigenic function. A 
polypeptide can have more than one biological function. 

Allelic variants are very common in nature. For example, a bacterial 

5 species, e.g., H. pylori, is usually represented by a variety of strains that differ 
from each other by minor allelic variations. Indeed, a polypeptide that fulfills 
the same biological function in different strains can have an amino acid 
sequence that is not identical in each of the strains. Such an allelic variation 
can be equally reflected at the polynucleotide level. 

10 Support for the use of allelic variants of polypeptide antigens comes 

from, e.g., studies of the Helicobacter urease antigen. The amino acid 
sequence of Helicobacter urease varies widely from species to species, yet 
cross-species protection occurs, indicating that the urease molecule, when used 
as an immunogen, is highly tolerant of amino acid variations. Even among 

15 different strains of the single species H pylori, there are amino acid sequence 
variations. 

For example, although the amino acid sequences of the UreA and 
UreB subunits of//, pylori and H.felis ureases differ from one another by 
26.5% and 1 1 .8%, respectively (Ferrero et al, Molecular Microbiology 

20 9(2):323-333, 1993), it has been shown that H. pylori urease protects mice from 
H.felis infection (Michetti et al. 9 Gastroenterology 107:1002, 1994). In 
addition, it has been shown that the individual structural subunits of urease, 
UreA and UreB, which contain distinct amino acid sequences, are both 
protective antigens against Helicobacter infection (Michetti et ah, supra); 

25 Similarly, Cuenca et al. (Gastroenterology 110:1 770, 1 996) showed that 

therapeutic immunization of H mustelae-'m&cted ferrets with H. pylori urease 
was effective at eradicating H. mustelae infection. Further, several urease 



BNSDOC1D: <WO 9843479A1 J_> 



WO 98/43479 PCT/US98/06421 

-11- 

variants have been reported to be effective vaccine antigens, including, e.g., 
recombinant UreA + UreB apoenzyme expressed from pORV142 (UreA and 
UreB sequences derived from H. pylori strain CPM630; Lee et al. 9 J. Infect. 
Dis.l72:161, 1995); recombinant UreA + UreB apoenzyme expressed from 
5 pORV214 (UreA and UreB sequences differ from H. pylori strain CPM630 by 
one and two amino acid changes, respectively; Lee et al. 9 supra, 1995); a 
UreA-glutathione-S-transferase fusion protein (UreA sequence from H. pylori 
strain ATCC 43504; Thomas et al. 9 Acta Gastro-Enterologica Belgica 56:54, 

1993) ; UreA + UreB holoenzyme purified from H. pylori strain 'NCTC1 1637 
10 (Marchetti et al. 9 Science 267:1655, 1995); a UreA-MBP fusion protein (UreA 

from H. pylori strain 85P; Ferrero et al. 9 Infection and Immunity 62:4981, 

1994) ; a UreB-MBP fusion protein (UreB from H. pylori strain 85P; Ferrero et 
al., supra); a UreA-MBP fusion protein (UreA from H.felis strain ATCC 
49179; Ferrero et al. 9 supra); a UreB-MBP fusion protein (UreB from H.felis 

1 5 strain ATCC 49 1 79; Ferrero et al., supra); and a 37 kDa fragment of UreB 
containing amino acids 220-569 (Dore-Davin et aL, "A 37 kD fragment of 
UreB is sufficient to confer protection against Helicobacter felis infection in 
mice"). Finally, Thomas et al. {supra) showed that oral immunization of mice 
with crude sonicates of H. pylori protected mice from subsequent challenge 

20 with H.felis. 

Polynucleotides, e.g., DNA molecules, encoding allelic variants can 
easily be obtained by polymerase chain reaction (PCR) amplification of 
genomic bacterial DNA extracted by conventional methods. This involves the 
use of synthetic oligonucleotide primers matching sequences that are upstream 

25 and downstream of the 5' and 3 f ends of the coding region. Suitable primers 

can be designed based on the nucleotide sequence information provided in SEQ 
ID NOs:l-21 (odd numbers), 65, and 67. Typically, a primer consists of 10 to 
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40, preferably 15 to 25 nucleotides. It can also be advantageous to select 
primers containing C and G nucleotides in proportions sufficient to ensure 
efficient hybridization, e.g., an amount of C and G nucleotides of at least 40%, 
preferably 50%, of the total nucleotide amount. Those skilled in the art can 

5 readily design primers that can be used to isolate the polynucleotides of the 
invention from different Helicobacter strains. 

As an example, primers useful for cloning a polynucleotide molecule 
encoding a polypeptide having the amino acid sequence of unprocessed GHPO 
386 (SEQ ID NO:2), including a signal peptide, are shown in SEQ ID NO:23 

10 (matching at the 5' end) and in SEQ ID NO:25 (matching at the 3 ! end). 

Primers useful for cloning a DNA molecule encoding a polypeptide having the 
amino acid sequence of mature GHPO 386 (amino acids 1-689 of SEQ ID 
NO:2), lacking a signal peptide, are shown in SEQ ID NO:24 (matching at the 
5* end) and in SEQ ID NO:25 (matching at the 3' end). Primers useful for 

1 5 cloning a DNA molecule encoding a polypeptide having the amino acid 
sequence of GHPO 1 360 (SEQ ID NO:66), are shown in SEQ ID NO:78 
(matching at the 5' end) and in SEQ ID NO:79 (matching at the 3' end). Use of 
these primers enables amplification of the entire gene encoding GHPO 1360. 
Primers having sequences shown in SEQ ID NO: 82 (matching at the 5' end of 

20 the coding sequence corresponding to the mature protein) and SEQ ID NO: 79 
(matching at the 3 1 end) can be used to amplify the portion of the gene encoding 
mature GHPO 1360. Experimental conditions for carrying out PCR can readily 
be determined by one skilled in the art and illustrations of carrying out PCR are 
provided in Examples 3 and 4. 

25 Thus, the first aspect of the invention includes: 
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(i) isolated polynucleotide molecules (e.g., DNA molecules) that can 
be amplified and/or cloned using the polymerase chain reaction from a 
Helicobacter, e.g., H. pylori, genome using either: 

- a 5* oligonucleotide primer having a sequence as shown in SEQ ID 
5 NO:23, and a 3' oligonucleotide primer having a sequence as shown in SEQ ID 

NO:25 (unprocessed GHPO 386); 

- a 5' oligonucleotide primer having a sequence as shown in SEQ ID 
NO:26, and a 3 1 oligonucleotide primer having a sequence as shown in SEQ ID 
NO:28 (unprocessed GHPO 789); 

10 - a 5' oligonucleotide primer having a sequence as shown in SEQ ID 

NO:29, and a 3' oligonucleotide primer having a sequence as shown in SEQ ID 
NO:31 (unprocessed GHPO 1516); 

- a 5' oligonucleotide primer having a sequence as shown in SEQ ID 
NO:32, and a 3' oligonucleotide primer having a sequence as shown in SEQ ID 

15 NO:34 (unprocessed GHPO 1 197); 

- a 5* oligonucleotide primer having a sequence as shown in SEQ ID 
NO:35, and a 3' oligonucleotide primer having a sequence as shown in SEQ ID 
NO:37 (unprocessed GHPO 1 180); 

- a 5' oligonucleotide primer having a sequence as shown in SEQ ID 
20 NO:38, and a 3' oligonucleotide primer having a sequence as shown in SEQ ID 

NO:40 (unprocessed GHPO 896); 

- a 5' oligonucleotide primer having a sequence as shown in SEQ ID 
NO:41, and a 3' oligonucleotide primer having a sequence as shown in SEQ ID 
NO:43 (unprocessed GHPO 711); 

25 - a 5* oligonucleotide primer having a sequence as shown in SEQ ID 

NO:44, and a 3' oligonucleotide primer having a sequence as shown in SEQ ID 
NO:46 (unprocessed GHPO 190); 
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- a 5' oligonucleotide primer having a sequence as shown in SEQ ID 
NO:47, and a 3* oligonucleotide primer having a sequence as shown in SEQ ID 
NO:49 (unprocessed GHPO 185); 

- a 5 1 oligonucleotide primer having a sequence as shown in SEQ ID 
5 NO: 50, and a 3' oligonucleotide primer having a sequence as shown in SEQ ID 

NO:52 (unprocessed GHPO 1417); 

- a 5' oligonucleotide primer having a sequence as shown in SEQ ID 
NO:53, and a 3* oligonucleotide primer having a sequence as shown in SEQ ID 
NO:55 (unprocessed GHPO 1414); 

10 - a 5 1 oligonucleotide primer having a sequence as shown in SEQ ID 

NO:78, and a 3 ? oligonucleotide primer having a sequence as shown in SEQ ID 
NO:79 (unprocessed GHPO 1360); or 

- a 5 ! oligonucleotide primer having a sequence as shown in SEQ ID 
NO:80 ? and a 3' oligonucleotide primer having a sequence as shown in SEQ ID 

15 NO:81 (GHPO 750); and 

(ii) isolated polynucleotide molecules (e.g., DNA molecules) that can 
be amplified and/or cloned by the polymerase chain reaction from a 
Helicobacter, e.g., H. pylori, genome using either: 

- a 5' oligonucleotide primer having a sequence as shown in SEQ ID 
20 NO:24, and a 3' oligonucleotide primer having a sequence as shown in SEQ ID 

NO:25 (mature GHPO 386); 

- a 5* oligonucleotide primer having a sequence as shown in SEQ ID 
NO:27 5 and a 3' oligonucleotide primer having a sequence as shown in SEQ ID 
NO:28 (mature GHPO 789); 

25 - a 5' oligonucleotide primer having a sequence as shown in SEQ ID 

NO:30, and a 3* oligonucleotide primer having a sequence as shown in SEQ ID 
NO:31 (mature GHPO 1516); 
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- a 5* oligonucleotide primer having a sequence as shown in SEQ ID 
NO:33, and a 3' oligonucleotide primer having a sequence as shown in SEQ ID 
NO:34 (mature GHPO 1 197); 

- a 5* oligonucleotide primer having a sequence as shown in SEQ ID 
5 NO:36, and a 3' oligonucleotide primer having a sequence as shown in SEQ ID 

NO:37 (mature GHPO 1 180); 

- a 5' oligonucleotide primer having a sequence as shown in SEQ ID 
NO:39, and a 3' oligonucleotide primer having a sequence as shown in SEQ ID 
NO:40 (mature GHPO 896); 

10 - a 5' oligonucleotide primer having a sequence as shown in SEQ ID 

NO:42, and a 3' oligonucleotide primer having a sequence as shown in SEQ ID 
NO:43 (mature GHPO 711); 

- a 5' oligonucleotide primer having a sequence as shown in SEQ ID 
NO:45, and a 3 f oligonucleotide primer having a sequence as shown in SEQ ID 

15 NO:46 (mature GHPO 190); 

- a 5 ! oligonucleotide primer having a sequence as shown in SEQ ID 
NO:48, and a 3 ! oligonucleotide primer having a sequence as shown in SEQ ID 
NO:49 (mature GHPO 185); 

- a 5' oligonucleotide primer having a sequence as shown in SEQ ID 
20 NO: 5 1 , and a 3 f oligonucleotide primer having a sequence as shown in SEQ ID 

NO: 52 (mature GHPO 1417); 

- a 5' oligonucleotide primer having a sequence as shown in SEQ ID 
NO:54, and a 3* oligonucleotide primer having a sequence as shown in SEQ ID 
NO:55 (mature GHPO 1414); or 

25 - a 5* oligonucleotide primer having a sequence as shown in SEQ ID 

NO: 82, and a 3' oligonucleotide primer having a sequence as shown in SEQ ID 
NO:79 (mature GHPO 1 360). 
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The 5* ends of the primers described above can advantageously 
include a restriction endonuclease recognition site that contains, typically, 4 to 
6 nucleotides. For example, the sequences 5'-GGATCC-3' (BamHl) or 5'- 
CTCGAG-3' (Xhol) can be used. Restriction sites can be selected by those 
5 skilled in the art so that the amplified DNA, when digested, if necessary, can be 
conveniently cloned into an appropriately digested vector, such as a plasmid 
vector. In addition, a 5* clamp (e.g., GCC) can be included in the primers 5' to 
the restriction endonuclease recognition site. 

Useful homologs that do not occur naturally can be designed using 
10 known methods for identifying regions of an antigen that are likely to be 
tolerant of amino acid sequence changes and/or deletions. For example, 
sequences of the antigen from different species can be compared to identify 
conserved sequences. 

Polypeptide derivatives that are encoded by polynucleotides of the 
15 invention include, e.g., fragments, polypeptides having large internal deletions 
derived from full-length polypeptides, and fusion proteins. Polypeptide 
fragments of the invention can be derived from a polypeptide having a 
sequence homologous to the sequences of any of SEQ ID NOs:2-22 (even 
numbers), 66, and 68, to the extent that the fragments retain the substantial 
20 antigenicity of the parent polypeptide (specific antigenicity). Polypeptide 
derivatives can also be constructed by large internal deletions that remove a 
substantial part of the parent polypeptide, while retaining specific antigenicity. 
Generally, polypeptide derivatives should be about at least 12 amino acids in 
length to maintain antigenicity. Advantageously, they can be at least 20 amino 
25 acids, preferably at least 50 amino acids, more preferably at least 75 amino 
acids, and most preferably at least 100 amino acids in length. 
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Useful polypeptide derivatives, e.g., polypeptide fragments, can be 
designed using computer-assisted analysis of amino acid sequences in order to 
identify sites in protein antigens having potential as surface-exposed, antigenic 
regions (Hughes et al., Infect. Immun. 60(9):3497, 1992). For example, the 
5 Laser Gene Program from DNA Star can be used to obtain hydrophilicity, 

antigenic index, and intensity index plots for the polypeptides of the invention. 
This program can also be used to obtain information about homologies of the 
polypeptides with known protein motifs. One skilled in the art can readily use 
the information provided in such plots to select peptide fragments for use as 

10 vaccine antigens. For example, fragments spanning regions of the plots in 
which the antigenic index is relatively high can be selected. One can also 
select fragments spanning regions in which both the antigenic index and the 
intensity plots are relatively high. Fragments containing conserved sequences, 
particularly hydrophilic conserved sequences, can also be selected. 

15 Polypeptide fragments and polypeptides having large internal 

deletions can be used for revealing epitopes that are otherwise masked in the 
parent polypeptide and that may be of importance for inducing a protective T 
cell-dependent immune response. Deletions can also remove immunodominant 
regions of high variability among strains. 

20 It is an accepted practice in the field of immunology to use fragments 

and variants of protein immunogens as vaccines, as all that is required to induce 
an immune response to a protein is a small (e.g., 8 to 10 amino acids) 
immunogenic region of the protein. This has been done for a number of 
vaccines against pathogens other than Helicobacter. For example, short - 

25 synthetic peptides corresponding to surface-exposed antigens of pathogens such 
as murine mammary tumor virus (peptide containing 1 1 amino acids; Dion et 
aL, Virology 179:474-477, 1990), Semliki Forest virus (peptide containing 16 
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amino acids; Snijders et al., J. Gen. Virol. 72:557-565, 1991), and canine 
parvovirus (2 overlapping peptides, each containing 15 amino acids; Langeveld 
et al., Vaccine 12(15):1473-1480, 1994) have been shown to be effective 
vaccine antigens against their respective pathogens. 

5 Polynucleotides encoding polypeptide fragments and polypeptides 

having large internal deletions can be constructed using standard methods (see, 
e.g., Ausubel et al. 9 Current Protocols in Molecular Biology , John Wiley & 
Sons Inc., 1994), for example, by PCR, including inverse PCR, by restriction 
enzyme treatment of the cloned DNA molecules, or by the method of Kunkel et 

10 al. (Proc. Natl. Acad. Sci. USA 82:448, 1985; biological material available at 
Stratagene). 

A polypeptide derivative can also be produced as a fusion 
polypeptide that contains a polypeptide or a polypeptide derivative of the 
invention fused, e.g., at the - or C- terminal end, to any other polypeptide 

15 (hereinafter referred to as a peptide tail). Such a product can be easily obtained 
by translation of a genetic fusion, i.e., a hybrid gene. Vectors for expressing 
fusion polypeptides are commercially available, and include the pMal-c2 or 
pMal-p2 systems of New England Biolabs, in which the peptide tail is a 
maltose binding protein, the glutathione-S-transferase system of Pharmacia, or 

20 the His-Tag system available from Novagen. These and other expression 

systems provide convenient means for further purification of polypeptides and 
derivatives of the invention. 

Another particular example of fusion polypeptides included in 
invention includes a polypeptide or polypeptide derivative of the invention 

25 fused to a polypeptide having adjuvant activity, such as, e.g., subunit B of 
either cholera toxin or E. coli heat-labile toxin. Several possibilities can be 
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used for producing such fusion proteins. First, the polypeptide of the invention 
can be fused to the 

N-terminal end or, preferably, to the C-terminal end of the polypeptide having 
adjuvant activity. Second, a polypeptide fragment of the invention can be fused 
5 within the amino acid sequence of the polypeptide having adjuvant activity. 
Spacer sequences can also be included, if desired. 

As stated above, the polynucleotides of the invention encode 
Helicobacter polypeptides in precursor or mature form. They can also encode 
hybrid precursors containing heterologous signal peptides, which can mature 

10 into polypeptides of the invention. By "heterologous signal peptide" is meant a 
signal peptide that is not found in the naturally-occurring precursor of a 
polypeptide of the invention. 

A polynucleotide of the invention hybridizes, preferably under 
stringent conditions, to a polynucleotide having a sequence as shown in any of 

1 5 SEQ ID NOs: 1 -2 1 (odd numbers), 65, and 67. Hybridization procedures are, 
e.g., described by Ausubel et al. (supra); Silhavy et at. (Experiments with Gene 
Fusions, Cold Spring Harbor Laboratory Press, Cold Spring Harbor, New 
York, 1984); and Davis et aL (A Manual for Genetic Engineering: Advanced 
Bacterial Genetics, Cold Spring Harbor Laboratory Press, Cold Spring Harbor, 

20 New York, 1980). Important parameters that can be considered for optimizing 
hybridization conditions are reflected in the following formula, which 
facilitates calculation of the melting temperature (Tm), which is the 
temperature above which two complementary DNA strands separate from one 
another (Casey et aL, Nucl. Acid Res. 4:1539, 1977): Tm = 81.5 + 0.5-x-(% 

25 G+C) + 1.6 log (positive ion concentration) - 0.6 x (% formamide). Under 
appropriate stringency conditions, hybridization temperature (Th) is 
approximately 20 to 40°C 5 20 to 25 °C, or, preferably, 30 to 40°C below the 
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calculated Tm. Those skilled in the art will understand that optimal 
temperature and salt conditions can be readily determined empirically in 
preliminary experiments using conventional procedures. For example, 
stringent conditions can be achieved, both for pre-hybridizing and hybridizing 

5 incubations, (i) within 4- 1 6 hours at 42 °C, in 6 x SSC containing 

50% formamide or (ii) within 4-16 hours at 65 °C in an aqueous 6 x SSC 
solution (1 M NaCl, 0.1 M sodium citrate (pH 7.0)). For polynucleotides 
containing 30 to 600 nucleotides, the above formula is used and then is 
corrected by subtracting (600/polynucleotide size in base pairs). Stringency 

10 conditions are defined by a Th that is 5 to 10°C below Tm. 

Hybridization conditions with oligonucleotides shorter than 20-30 
bases do not precisely follow the rules set forth above. In such cases, the 
formula for calculating the Tm is as follows: Tm = 4 x (G+C) + 2 (A+T). For 
example, an 1 8 nucleotide fragment of 50% G+C would have an approximate 

15 Tmof54°C. 

A polynucleotide molecule of the invention, containing RNA, DNA, 
or modifications or combinations thereof, can have various applications. For 
example, a polynucleotide molecule can be used (i) in a process for producing 
the encoded polypeptide in a recombinant host system, (ii) in the construction 

20 of vaccine vectors such as poxviruses, which are further used in methods and 
compositions for preventing and/or treating Helicobacter infection, (iii) as a 
vaccine agent, in a naked form or formulated with a delivery vehicle, and (iv) 
in the construction of attenuated Helicobacter strains that can over-express a 
polynucleotide of the invention or express it in a non-toxic, mutated form. 

25 According to a second aspect of the invention, there is therefore 

provided (i) an expression cassette containing a polynucleotide molecule of the 
invention placed under the control of elements (e.g., a promoter) required for 
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expression; (ii) an expression vector containing an expression cassette of the 
invention; (iii) a procaryotic or eucaryotic cell transformed or transfected with 
an expression cassette and/or vector of the invention; as well as (iv) a process 
for producing a polypeptide or polypeptide derivative encoded by a 
5 polynucleotide of the invention, which involves culturing a procaryotic or 
eucaryotic cell transformed or transfected with an expression cassette and/or 
vector of the invention, under conditions that allow expression of the 
polynucleotide molecule of the invention and, recovering the encoded 
polypeptide or polypeptide derivative from the cell culture. 

10 A recombinant expression system can be selected from procaryotic 

and eucaryotic hosts. Eucaiyotic hosts include, for example, yeast cells (e.g., 
Saccharomyces cerevisiae or Pichia pastoris), mammalian cells (e.g., COS1, 
NIH3T3, or JEG3 cells), arthropods cells (e.g., Spodoptera frugiperda (SF9) 
cells), and plant cells. Preferably, a procaryotic host such as E. coli is used. 

15 Bacterial and eucaryotic cells are available from a number of different sources 
that are known to those skilled in the art, e.g., the American Type Culture 
Collection (ATCC; Rockville, Maryland). 

The choice of the expression cassette will depend on the host system 
selected, as well as the features desired for the expressed polypeptide. For 

20 example, it may be useful to produce a polypeptide of the invention in a 

particular lipidated form or any other form. Typically, an expression cassette 
includes a constitutive or inducible promoter that is functional in the selected 
host system; a ribosome binding site; a start codon (ATG); if necessary, a 
region encoding a signal peptide, e.g., a lipidation signal peptide; a - 

25 polynucleotide molecule of the invention; a stop codon; and, optionally, a 3 1 
terminal region (translation and/or transcription terminator). The signal 
peptide-encoding region is adjacent to the polynucleotide of the invention and 
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is placed in the proper reading frame. The signal peptide-encoding region can 
be homologous or heterologous to the polynucleotide molecule encoding the 
mature polypeptide and it can be specific to the secretion apparatus of the host 
used for expression. The open reading frame constituted by the polynucleotide 
5 molecule of the invention, alone or together with the signal peptide, is placed 
under the control of the promoter so that transcription and translation occur in 
the host system. Promoters and signal peptide-encoding regions are widely 
known and available to those skilled in the art and include, for example, the 
promoter of Salmonella typhimurium (and derivatives) that is inducible by 
10 arabinose (promoter araB) and is functional in Gram-negative bacteria such as 
E. coli (U.S. Patent No. 5,028,530; Cagnon et al., Protein Engineering 
4(7):843, 1991); the promoter of the bacteriophage T7 RNA polymerase gene, 
which is functional in a number of E. coli strains expressing T7 polymerase 
(U.S. Patent No. 4,952,496); the OspA lipidation signal peptide; and RlpB 
1 5 lipidation signal peptide (Takase et al, J. Bact. 169:5692, 1987). 

The expression cassette is typically part of an expression vector, 
which is selected for its ability to replicate in the chosen expression system. 
Expression vectors {e.g., plasmids or viral vectors) can be chosen from, for 
example, those described in Pouwels et al. {Cloning Vectors: A Laboratory 
20 Manual, 1985, Supp. 1987) and can purchased from various commercial 

sources. Methods for transforming or transfecting host cells with expression 
vectors are well known in the art and will depend on the host system selected, 
as described in Ausubel et al. {supra). 

Upon expression, a recombinant polypeptide of the invention (or a 
25 polypeptide derivative) is produced and remains in the intracellular 

compartment, is secreted/excreted in the extracellular medium or in the 
periplasmic space, or is embedded in the cellular membrane. The polypeptide 
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can then be recovered in a substantially purified form from the cell extract or 
from the supernatant after centrifugation of the cell culture. Typically, the 
recombinant polypeptide can be purified by antibody-based affinity purification 
or by any other method known to a person skilled in the art, such as by genetic 
5 fusion to a small affinity-binding domain. Antibody-based affinity purification 
methods are also available for purifying a polypeptide of the invention 
extracted from a Helicobacter strain. Antibodies useful for immunoaffinity 
purification of the polypeptides of the invention can be obtained using methods 
described below. 

10 Polynucleotides of the invention can also be used in DNA 

vaccination methods, using either a viral or bacterial host as gene delivery 
vehicle (live vaccine vector) or administering the gene in a free form, e.g., 
inserted into a plasmid. Therapeutic or prophylactic efficacy of a 
polynucleotide of the invention can be evaluated as is described below. 

15 Accordingly, in a third aspect of the invention, there is provided (i) a 

vaccine vector such as a poxvirus, containing a polynucleotide molecule of the 
invention placed under the control of elements required for expression; (ii) a 
composition of matter containing a vaccine vector of the invention, together 
with a diluent or carrier; (iii) a pharmaceutical composition containing a 

20 therapeutically or prophylactically effective amount of a vaccine vector of the 
invention; (iv) a method for inducing an immune response against Helicobacter 
in a mammal (e.g., a human; alternatively, the method can be used in veterinary 
applications for treating or preventing Helicobacter infection of animals, e.g., 
cats or birds), which involves administering to the mammal an 

25 immunogenically effective amount of a vaccine vector of the invention to elicit 
an immune response, e.g., a protective or therapeutic immune response to 
Helicobacter, and (v) a method for preventing and/or treating a Helicobacter 



BNSDOCID: <WO 9843479 A 1J_> 



WO 98/43479 



PCT/US98/06421 



-24- 

(e.g., H. pylori, H.felis, H. mustelae, or H. heilmanii) infection, which involves 
administering a prophylactic or therapeutic amount of a vaccine vector of the 
invention to an individual in need. Additionally, the third aspect of the 
invention encompasses the use of a vaccine vector of the invention in the 
5 preparation of a medicament for preventing and/or treating Helicobacter 
infection. 

A vaccine vector of the invention can express one or several 
polypeptides or derivatives of the invention, as well as at least one additional 
Helicobacter antigen such as a urease apoenzyme or a subunit, fragment, 

1 0 homolog, mutant, or derivative thereof. In addition, it can express a cytokine, 
such as interleukin-2 (1L-2) or interleukin-12 (IL-12), that enhances the 
immune response. Thus, a vaccine vector can include an additional 
polynucleotide molecules encoding, e.g., urease subunit A, B, or both, or a 
cytokine, placed under the control of elements required for expression in a 

15 mammalian cell. 

Alternatively, a composition of the invention can include several 
vaccine vectors, each of which being capable of expressing a polypeptide or 
derivative of the invention. A composition can also contain a vaccine vector 
capable of expressing an additional Helicobacter antigen such as urease 

20 apoenzyme, a subunit, fragment, homolog, mutant, or derivative thereof, or a 
cytokine such as IL-2 or IL-12. 

In vaccination methods for treating or preventing infection in a 
mammal, a vaccine vector of the invention can be administered by any 
conventional route in use in the vaccine field, for example, to a mucosal (e.g., 

25 ocular, intranasal, oral, gastric, pulmonary, intestinal, rectal, vaginal, or urinary 
tract) surface or via a parenteral (e.g^ subcutaneous, intradermal, 
intramuscular, intravenous, or intraperitoneal) route. Preferred routes depend 
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upon the choice of the vaccine vector. The administration can be achieved in a 
single dose or repeated at intervals. The appropriate dosage depends on various 
parameters that are understood by those skilled in the art, such as the nature of 
the vaccine vector itself, the route of administration, and the condition of the 
5 mammal to be vaccinated (e.g., the weight, age, and general health of the 
mammal). 

Live vaccine vectors that can be used in the invention include viral 
vectors, such as adenoviruses and poxviruses, as well as bacterial vectors, e.g., 
Shigella, Salmonella, Vibrio cholerae, Lactobacillus, Bacille bilie de Calmette- 

10 Guerin (BCG), and Streptococcus. An example of an adenovirus vector, as 

well as a method for constructing an adenovirus vector capable of expressing a 
polynucleotide molecule of the invention, is described in U.S. Patent No. 
4,920,209. Poxvirus vectors that can be used in the invention include, e.g., 
vaccinia and canary pox viruses, which are described in U.S. Patent No. 

15 4,722,848 and U.S. Patent No. 5,364,773, respectively (also see, e.g., Tartaglia 
et aL, Virology 1 88:21 7, 1 992, for a description of a vaccinia virus vector, and 
Taylor et aU Vaccine 13:539, 1995, for a description of a canary poxvirus 
vector). Poxvirus vectors capable of expressing a polynucleotide of the 
invention can be obtained by homologous recombination, as described in Kieny 

20 et al. (Nature 3 12:163, 1984) so that the polynucleotide of the invention is 
inserted in the viral genome under appropriate conditions for expression in 
mammalian cells. Generally, the dose of viral vector vaccine, for therapeutic 
or prophylactic use, can be from about lxlO 4 to about lxlO 11 , advantageously 
from about lxl 0 7 to about lxlO 10 , or, preferably, from about 1x1 0 7 to about 

25 lxlO 9 plaque- forming units per kilogram. Preferably, viral vectors are 
administered parenterally, for example, in 3 doses that are 4 weeks apart. 
Those skilled in the art will recognize that it is preferable to avoid adding a 
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chemical adjuvant to a composition containing a viral vector of the invention 
and thereby minimizing the immune response to the viral vector itself. 

Non-toxicogenic Vibrio cholerae mutant strains that can be used in 
live oral vaccines are described by Mekalanos et al. (Nature 306:551, 1983) 
5 and in U.S. Patent No. 4,882,278 (strain in which a substantial amount of the 
coding sequence of each of the two ctxA alleles has been deleted so that no 
functional cholerae toxin is produced); WO 92/1 1354 (strain in which the irgA 
locus is inactivated by mutation; this mutation can be combined in a single 
strain with ctxA mutations); and WO 94/1533 (deletion mutant lacking 

10 functional ctxA and attRSl DNA sequences). These strains can be genetically 
engineered to express heterologous antigens, as described in WO 94/19482. 
An effective vaccine dose of a V. cholerae strain capable of expressing a 
polypeptide or polypeptide derivative encoded by a polynucleotide molecule of 
the invention can contain, e.g., about 1x10 s to about lxlO 9 , preferably about 

15 lxlO 6 to about lxl 0 8 viable bacteria in an appropriate volume for the selected 
route of administration. Preferred routes of administration include all mucosal 
routes, but, most preferably, these vectors are administered intranasally or 
orally. 

Attenuated Salmonella typhimurium strains, genetically engineered 
20 for recombinant expression of heterologous antigens, and their use as oral 

vaccines, are described by Nakayama et al. (Bio/Technology 6:693, 1988) and 
in WO 92/1 1361 . Preferred routes of administration for these vectors include 
all mucosal routes. Most preferably, the vectors are administered intranasally 
or orally. 

25 Others bacterial strains useful as vaccine vectors are described by 

High et al. (EMBO 11:1991, 1992) and Sizemore et al (Science 270:299, 
1995; Shigella flexneri); Medaglini etal. (Proc. Natl. Acad. Sci. USA 92:6868, 
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1995; {Streptococcus gordonii); Flynn (Cell. Mol. Biol. 40 (suppl. I):31, 1194), 
and in WO 88/6626, WO 90/0594, WO 91/13157, WO 92/1796, and WO 
92/21376 (Bacille Calmette Guerin). In bacterial vectors, a polynucleotide of 
the invention can be inserted into the bacterial genome or it can remain in a free 
5 state, for example, carried on a plasmid. 

An adjuvant can also be added to a composition containing a 
bacterial vector vaccine. A number of adjuvants that can be used are known to 
those skilled in the art. For example, preferred adjuvants can be selected from 
the list provided below. 

10 According to a fourth aspect of the invention, there is also provided 

(i) a composition of matter containing a polynucleotide of the invention, 
together with a diluent or carrier; (ii) a pharmaceutical composition containing 
a therapeutically or prophylactically effective amount of a polynucleotide of the 
invention; (iii) a method for inducing an immune response against 

15 Helicobacter, in a mammal, by administering to the mammal an 

immunogenically effective amount of a polynucleotide of the invention to elicit 
an immune response, e.g., a protective immune response to Helicobacter; and 
(iv) a method for preventing and/or treating a Helicobacter (e.g., H. pylori, H. 
felis, H. mustelae, or H. heilmanii) infection, by administering a prophylactic or 

20 therapeutic amount of a polynucleotide of the invention to an individual in need 
of such treatment. Additionally, the fourth aspect of the invention encompasses 
the use of a polynucleotide of the invention in the preparation of a medicament 
for preventing and/or treating Helicobacter infection. The fourth aspect of the 
invention preferably includes the use of a polynucleotide molecule placed 

25 under conditions for expression in a mammalian cell, e.g., in a plasmid that is 
unable to replicate in mammalian cells and to substantially integrate into a 
mammalian genome. 
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Polynucleotides (for example, DNA or RNA molecules) of the 
invention can also be administered as such to a mammal as a vaccine. When a 
DNA molecule of the invention is used, it can be in the form of a plasmid that 
is unable to replicate in a mammalian cell and unable to integrate into the 

5 mammalian genome. Typically, a DNA molecule is placed under the control of 
a promoter suitable for expression in a mammalian cell. The promoter can 
function ubiquitously or tissue-specifically. Examples of non-tissue specific 
promoters include the early Cytomegalovirus (CMV) promoter (U.S. Patent 
No. 4,168,062) and the Rous Sarcoma Virus promoter (Norton et aL, Molec. 

10 Cell Biol. 5:281, 1985). The desmin promoter (Li et aL, Gene 78:243, 1989; Li 
et aL, J. Biol. Chem. 266:6562, 1991; Li et aL, J. Biol. Chem. 268:10403, 
1993) is tissue-specific and drives expression in muscle cells. More generally, 
useful promoters and vectors are described, e.g., in WO 94/21797 and by 
Hartikka et aL (Human Gene Therapy 7:1205, 1996). 

15 For DNA/RNA vaccination, the polynucleotide of the invention can 

encode a precursor or a mature form of a polypeptide of the invention. When it 
encodes a precursor form, the precursor sequence can be homologous or 
heterologous. In the latter case, a eucaryotic leader sequence can be used, such 
as the leader sequence of the tissue-type plasminogen factor (tPA). 

20 A composition of the invention can contain one or several 

polynucleotides of the invention. It can also contain at least one additional 
polynucleotide encoding another Helicobacter antigen, such as urease subunit 
A, B, or both, or a fragment, derivative, mutant, or analog thereof. A 
polynucleotide encoding a cytokine, such as interleukin-2 (IL-2) or interleukin- 

25 12 (IL-12), can also be added to the composition so that the immune response 
is enhanced. These additional polynucleotides are placed under appropriate 
control for expression. Advantageously, DNA molecules of the invention 
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and/or additional DNA molecules to be included in the same composition are 

carried in the same plasmid. 

Standard methods can be used in the preparation of therapeutic 

polynucleotides of the invention. For example, a polynucleotide can be used in 
5 a naked form, free of any delivery vehicles, such as anionic liposomes, cationic 

lipids, microparticles, e.g., gold microparticles, precipitating agents, e.g., 

calcium phosphate, or any other transfection-facilitating agent. In this case, the 

polynucleotide can be simply diluted in a physiologically acceptable solution, 

such as sterile saline or sterile buffered saline, with or without a carrier. When 
1 0 present, the carrier preferably is isotonic, hypotonic, or weakly hypertonic, and 

has a relatively low ionic strength, such as provided by a sucrose solution, e.g., 

a solution containing 20% sucrose. 

Alternatively, a polynucleotide can be associated with agents that 

assist in cellular uptake. It can be, e.g., (i) complemented with a chemical 
1 5 agent that modifies cellular permeability, such as bupivacaine (see, e.g., 

WO 94/16737), (ii) encapsulated into liposomes, or (iii) associated with 

cationic lipids or silica, gold, or tungsten microparticles. 

Anionic and neutral liposomes are well-known in the art (see, e.g., 

Liposomes: A Practical Approach, RPC New Ed, IRL Press, 1990, for a 
20 detailed description of methods for making liposomes) and are useful for 

delivering a large range of products, including polynucleotides. 

Cationic lipids can also be used for gene delivery. Such lipids 

include, for example, Lipofectin™, which is also known as DOTMA (N-[l- 

(2,3-dioleyloxy)propyl]-N,N,N-trimethylammonium chloride), DOTAP (1,2- 
25 bis(oleyloxy)-3-(trimethylammonio)propane), DDAB 

(dimethyldioctadecylammonium bromide), DOGS (dioctadecylamidologlycyl 

spermine), and cholesterol derivatives. A description of these cationic lipids 
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can be found in EP 187,702, WO 90/1 1092, U.S. Patent No. 5,283,185, 
WO 91/15501, WO 95/26356, and U.S. Patent No. 5,527,928. Cationic lipids 
for gene delivery are preferably used in association with a neutral lipid such as 
DOPE (dioleyl phosphatidyl ethanolamine; WO 90/1 1092). Other transfection- 

5 facilitating compounds can be added to a formulation containing cationic 
liposomes. A number of them are described in, e.g., WO 93/1 8759, 
WO 93/19768, WO 94/25608, and WO 95/2397. They include, e.g., spermine 
derivatives useful for facilitating the transport of DNA through the nuclear 
membrane (see, for example, WO 93/18759) and membrane-permeabilizing 

10 compounds such as GALA, Gramicidine S, and cationic bile salts (see, for 
example, WO 93/19768). 

Gold or tungsten microparticles can also be used for gene delivery, 
as described in WO 91/359, WO 93/17706, and by Tang et al (Nature 356:152, 
1992). In this case, the microparticle-coated polynucleotides can be injected 

15 via intradermal or intraepidermal routes using a needleless injection device 

("gene gun"), such as those described in U.S. Patent No. 4,945,050, U.S. Patent 
No. 5,015,580, and WO 94/24263. 

The amount of DNA to be used in a vaccine recipient depends, e.g., 
on the strength of the promoter used in the DNA construct, the immunogenicity 

20 of the expressed gene product, the condition of the mammal intended for 

administration {e.g., the weight, age, and general health of the mammal), the 
mode of administration, and the type of formulation. In general, a 
therapeutically or prophylactically effective dose from about 1 |xg to about 
1 mg, preferably, from about 10 |ig to about 800 jLtg, and, more preferably, from 

25 about 25 |Ltg to about 250 (ig, can be administered to human adults. The 
administration can be achieved in a single dose or repeated at intervals. 
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The route of administration can be any conventional route used in the 
vaccine field. As general guidance, a polynucleotide of the invention can be 
administered via a mucosal surface, e.g., an ocular, intranasal, pulmonary, oral, 
intestinal, rectal, vaginal, or urinary tract surface, or via a parenteral route, e.g., 
5 by an intravenous, subcutaneous, intraperitoneal, intradermal, intraepi dermal, 
or intramuscular route. The choice of administration route will depend on, e.g., 
the formulation that is selected. A polynucleotide formulated in association 
with bupivacaine is advantageously administered into muscle. When a neutral 
or anionic liposome or a cationic lipid, such as DOTMA, is used, the 

10 formulation can be advantageously injected via intravenous, intranasal (for 
example, by aerosolization), intramuscular, intradermal, and subcutaneous 
routes. A polynucleotide in a naked form can advantageously be administered 
via the intramuscular, intradermal, or subcutaneous routes. Although not 
absolutely required, such a composition can also contain an adjuvant. A 

15 systemic adjuvant that does not require concomitant administration in order to 
exhibit an adjuvant effect is preferable. 

The sequence information provided in the present application enables 
the design of specific nucleotide probes and primers that can be used in 
diagnostic methods. Accordingly, in a fifth aspect of the invention, there is 

20 provided a nucleotide probe or primer having a sequence found in, or derived 
by degeneracy of the genetic code from, a sequence shown in any of SEQ ID 
NOs: 1-21 (odd numbers), 65, and 67, or a complementary sequence thereof. 

" The term "probe" as used in the present application refers to DNA 
(preferably single stranded) or RNA molecules (or modifications or ■ - « 

25 combinations thereof) that hybridize under the stringent conditions, as defined 
above, to polynucleotide molecules having sequences homologous to those 
shown in any of SEQ ID NOs: 1-21 (odd numbers), 65, and 67, or to a - 
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complementary or anti-sense sequence of any of SEQ ID NOs: 1-21 (odd 
numbers), 65, and 67. Generally, probes are significantly shorter than the full- 
length sequences shown in any of SEQ ID NOs: 1-21 (odd numbers), 65, and 
67. For example, they can contain from about 5 to about 100, preferably from 
5 about 10 to about 80 nucleotides. In particular, probes have sequences that are 
at least 75%, preferably at least 85%, more preferably 95% homologous to a 
portion of a sequence as shown in any of SEQ ID NOs: 1-21 (odd numbers), 65, 
and 67, or a sequence complementary to such sequences. 

Probes can contain modified bases, such as inosine, methyl-5- 

10 deoxycytidine, deoxyuridine, dimethylamino-5-deoxyuridine, or diamino-2, 6- 
purine. Sugar or phosphate residues can also be modified or substituted. For 
example, a deoxyribose residue can be replaced by a polyamide (Nielsen et aL, 
Science 254:1497, 1991) and phosphate residues can be replaced by ester 
groups such as diphosphate, alkyl, arylphosphonate, and phosphorothioate 

1 5 esters. In addition, the 2 ! -hydroxyl group on ribonucleotides can be modified 
by addition of, e.g., alkyl groups. 

Probes of the invention can be used in diagnostic tests, or as capture 
or detection probes. Such capture probes can be immobilized on solid supports, 
directly or indirectly, by covalent means or by passive adsorption. A detection 

20 probe can be labeled by a detectable label, for example a label selected from 
radioactive isotopes; enzymes, such as peroxidase and alkaline phosphatase; 
enzymes that are able to hydrolyze a chromogenic, fluorogenic, or luminescent 
substrate; compounds that are chromogenic, fluorogenic, or luminescent; 
nucleotide base analogs; and biotin. 

25 Probes of the invention can be used in any conventional 

hybridization method, such as in dot blot methods (Maniatis et al. 9 Molecular 
Cloning: A Labor atoiy Manual^ Cold Spring Harbor Laboratory Press, Cold 
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Spring Harbor, New York, 1982), Southern blot methods (Southern, J. Mol. 
Biol. 98:503, 1975), northern blot methods (identical to Southern blot to the 
exception that RNA is used as a target), or a sandwich method (Dunn et al. 9 
Cell 12:23, 1977). As is known in the art, the latter technique involves the use 
5 of a specific capture probe and a specific detection probe that have nucleotide 
sequences that are at least partially different from each other. 

Primers used in the invention usually contain about 10 to 
40 nucleotides and are used to initiate enzymatic polymerization of DNA in an 
amplification process (e.g., PCR), an elongation process, or a reverse 
10 transcription method. In a diagnostic method involving PCR, the primers can 
be labeled. 

Thus, the invention also encompasses (i) a reagent containing a 
probe of the invention for detecting and/or identifying the presence of 
Helicobacter in a biological material; (ii) a method for detecting and/or 

15 identifying the presence of Helicobacter in a biological material, in which (a) a 
sample is recovered or derived from the biological material, (b) DNA or RNA 
is extracted from the material and denatured, and (c) the sample is exposed to a 
probe of the invention, for example, a capture probe, a detection probe, or both, 
under stringent hybridization conditions, so that hybridization is detected; and 

20 (iii) a method for detecting and/or identifying the presence of Helicobacter in a 
biological material, in which (a) a sample is recovered or derived from the 
biological material, (b) DNA is extracted therefrom, (c) the extracted DNA is 
contacted with at least one, or, preferably two, primers of the invention, and 
amplified by the polymerase chain reaction, and (d) an amplified DNA - - 

25 molecule is produced. 

As mentioned above, polypeptides that can be produced by 
expression of the polynucleotides of the invention can be used as vaccine 
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antigens. Accordingly, a sixth aspect of the invention features a substantially 
purified polypeptide or polypeptide derivative having an amino acid sequence 
encoded by a polynucleotide of the invention. 

A "substantially purified polypeptide" is defined as a polypeptide 
5 that is separated from the environment in which it naturally occurs and/or a 
polypeptide that is free of most of the other polypeptides that are present in the 
environment in which it was synthesized. The polypeptides of the invention 
can be purified from a natural source, such as a Helicobacter strain, or can be 
produced using recombinant methods. 

10 Homologous polypeptides or polypeptide derivatives encoded by 

polynucleotides of the invention can be screened for specific antigenicity by 
testing cross-reactivity with an antiserum raised against a polypeptide having 
an amino acid sequence as shown in any of SEQ ID NOs:2-22 (even numbers), 
66, and 68. Briefly, a monospecific hyperimmune antiserum can be raised 

15 against a purified reference polypeptide as such or as a fusion polypeptide, for 
example, an expression product of MBP, GST, or His-tag systems, or a 
synthetic peptide predicted to be antigenic. The homologous polypeptide or 
derivative that is screened for specific antigenicity can be produced as such or 
as a fusion polypeptide. In the latter case, and if the antiserum is also raised 

20 against a fusion polypeptide, two different fusion systems are employed. 

Specific antigenicity can be determined using a number of methods, including 
Western blot (Towbin et al. 9 Proc. Natl. Acad. Sci. USA 76:4350, 1979), dot 
blot, and ELISA methods, as described below. 

In a Western blot assay, the product to be screened, either asa 

25 purified preparation or a total E. coli extract, is fractionated by SDS-PAGE, as 
described, for example, by Laemmli (Nature 227:680, 1970). After being 
transferred to a filter, such as a nitrocellulose membrane, the material is 
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incubated with the monospecific hyperimmune antiserum, which is diluted in a 
range of dilutions from about 1:50 to about 1:5000, preferably from about 
1 : 100 to about 1 :500. Specific antigenicity is shown once a band 
corresponding to the product exhibits reactivity at any of the dilutions in the 
5 range. 

In an ELISA assay, the product to be screened can be used as the 
coating antigen. A purified preparation is preferred, but a whole cell extract 
can also be used. Briefly, about 1 00 jxl of a preparation of about 10 \xg 
protein/ml is distributed into wells of a 96- well ELISA plate. The plate is 

10 incubated for about 2 hours at 37°C, then overnight at 4°C. The plate is 
washed with phosphate buffered saline (PBS) containing 0.05% Tween 20 
(PBS/Tween buffer) and the wells are saturated with 250 |il PBS containing 
1% bovine serum albumin (BSA), to prevent non-specific antibody binding. 
After 1 hour of incubation at 37 °C, the plate is washed with PBS/Tween buffer. 

15 The antiserum is serially diluted in PBS/Tween buffer containing 0.5% BSA, 
and 100 \i\ dilutions are added to each well. The plate is incubated for 
90 minutes at 37 °C, washed, and evaluated using standard methods. For 
example, a goat anti-rabbit peroxidase conjugate can be added to the wells 
when the specific antibodies used were raised in rabbits. Incubation is carried 

20 out for about 90 minutes at 37 °C and the plate is washed. The reaction is 
developed with the appropriate substrate and the reaction is measured by 
colorimetry (absorbance measured spectrophotometrically). Under these 
experimental conditions, a positive reaction is shown once an O.D. value of 1.0 
is detected with a dilution of at least about 1:50, preferably of at least about 

25 1:500. 

In a dot blot assay, a purified product is preferred, although a whole 
cell extract can be used. Briefly, a solution of the product at a concentration of 
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about 100 (ig/ml is serially diluted two-fold with 50 mM Tris-HCl (pH 7.5). 
One hundred jal of each dilution is applied to a filter, such as a 0.45 jim 
nitrocellulose membrane, set in a 96- well dot blot apparatus (Biorad). The 
buffer is removed by applying vacuum to the system. Wells are washed by 
5 addition of 50 mM Tris-HCl (pH 7.5) and the membrane is air-dried. The 
membrane is saturated in blocking buffer (50 mM Tris-HCl (pH 7.5), 0.15 M 
NaCl, 10 g/L skim milk) and incubated with an antiserum diluted from about 
1 :50 to about 1 :5000, preferably about 1 :500. The reaction is detected using 
standard methods. For example, a goat anti-rabbit peroxidase conjugate can be 

10 added to the wells when rabbit antibodies are used. Incubation is carried out 

for about 90 minutes at 37 °C and the blot is washed. The reaction is developed 
with the appropriate substrate and stopped. The reaction is then measured 
visually by the appearance of a colored spot, e.g., by colorimetry. Under these 
experimental conditions, a positive reaction is associated with detection of a 

1 5 colored spot for reactions carried out with a dilution of at least about 1 :50, 
preferably, of at least about 1 :500. Therapeutic or prophylactic efficacy of a 
polypeptide or polypeptide derivative of the invention can be evaluated as is 
described below. 

According to a seventh aspect of the invention, there is provided (i) a 
20 composition of matter containing a polypeptide of the invention together with a 
diluent or carrier; (ii) a pharmaceutical composition containing a 
therapeutically or prophylactically effective amount of a polypeptide of the 
invention; (iii) a method for inducing an immune response against Helicobacter 
in a mammal by administering to the mammal an immunogenically effective 
25 amount of a polypeptide of the invention to elicit an immune response, e.g., a 
protective immune response to Helicobacter; and (iv) a method for preventing 
and/or treating a Helicobacter (e.g., H. pylori, Hfelis, H. mustelae, orH. 
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heilmanii) infection, by administering a prophylactic or therapeutic amount of a 
polypeptide of the invention to an individual in need of such treatment. 
Additionally, this aspect of the invention includes the use of a polypeptide of 
the invention in the preparation of a medicament for preventing and/or treating 
5 Helicobacter infection. 

The immunogenic compositions of the invention can be administered 
by any conventional route in use in the vaccine field, for example, to a mucosal 
(e.g., ocular, intranasal, pulmonary, oral, gastric, intestinal, rectal, vaginal, or 
urinary tract) surface or via a parenteral (e.g., subcutaneous, intradermal, 

10 intramuscular, intravenous, or intraperitoneal) route. The choice of the 
administration route depends upon a number of parameters, such as the 
adjuvant used. For example, if a mucosal adjuvant is used, the intranasal or 
oral route will be preferred, and if a lipid formulation or an aluminum 
compound is used, a parenteral route will be preferred. In the latter case, the 

15 subcutaneous or intramuscular route is most preferred. The choice of 

administration route can also depend upon the nature of the vaccine agent. For 
example, a polypeptide of the invention fused to CTB or to LTB will be best 
administered to a mucosal surface. 

A composition of the invention can contain one or several 

20 polypeptides or derivatives of the invention. It can also contain at least one 
additional Helicobacter antigen, such as the urease apoenzyme, or a subunit, 
fragment, homolog, mutant, or derivative thereof. 

For use in a composition of the invention, a polypeptide or 
polypeptide derivative can be formulated into or with liposomes, such as 

25 neutral or anionic liposomes, microspheres, ISCOMS, or virus-like particles 
(VLPs), to facilitate delivery and/or enhance the immune response. These 
compounds are readily available to those skilled in the art; for example; see 
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Liposomes: A Practical Approach {supra). Adjuvants other than liposomes can 
also be used in the invention and are well known in the art (see, for example, 
the list provided below). 

Administration can be achieved in a single dose or repeated as 

5 necessary at intervals that can be determined by one skilled in the art. For 
example, a priming dose can be followed by three booster doses at weekly or 
monthly intervals. An appropriate dose depends on various parameters, 
including the nature of the recipient (e.g., whether the recipient is an adult or an 
infant), the particular vaccine antigen, the route and frequency of 

10 administration, the presence/absence or type of adjuvant, and the desired effect 
(e.g., protection and/or treatment), and can be readily determined by one skilled 
in the art. In general, a vaccine antigen of the invention can be administered 
mucosally in an amount ranging from about 10 fig to about 500 mg, preferably 
from about 1 mg to about 200 mg. For a parenteral route of administration, the 

15 dose usually should not exceed about 1 mg, and is, preferably, about 100 |ig. 

When used as components of a vaccine, the polynucleotides and 
polypeptides of the invention can be used sequentially as part of a multi-step 
immunization process. For example, a mammal can be initially primed with a 
vaccine vector of the invention, such as a pox virus, e.g., via a parenteral route, 

20 and then boosted twice with a polypeptide encoded by the vaccine vector, e.g., 
via the mucosal route. In another example, liposomes associated with a 
polypeptide or polypeptide derivative of the invention can be used for priming, 
with boosting being carried out mucosally using a soluble polypeptide or 
polypeptide derivative of the invention, in combination with a mucosal - 

25 adjuvant (e.g., LT). 

Polypeptides and polypeptide derivatives of the invention can also be 
used as diagnostic reagents for detecting the presence of anti-Helicobacter 
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antibodies, e.g., in blood samples. Such polypeptides can be about 5 to about 
80, preferably, about 10 to about 50 amino acids in length and can be labeled or 
unlabeled, depending upon the diagnostic method. Diagnostic methods 
involving such a reagent are described below. 

5 Upon expression of a polynucleotide molecule of the invention, a 

polypeptide or polypeptide derivative is produced and can be purified using 
known methods. For example, the polypeptide or polypeptide derivative can be 
produced as a fusion protein containing a fused tail that facilitates purification. 
The fusion product can be used to immunize a small mammal, e.g., a mouse or 

10 a rabbit, in order to raise monospecific antibodies against the polypeptide or 
polypeptide derivative. The eighth aspect of the invention thus provides a 
monospecific antibody that binds to a polypeptide or polypeptide derivative of 
the invention. 

By "monospecific antibody" is meant an antibody that is capable of 
15 reacting with a unique, naturally-occurring Helicobacter polypeptide. An 
antibody of the invention can be polyclonal or monoclonal. Monospecific 
antibodies can be recombinant, e.g., chimeric (e.g., consisting of a variable 
region of murine origin and a human constant region), humanized (e.g., a 
human immunoglobulin constant region and a variable region of animal, e.g., 
20 murine, origin), and/or single chain. Both polyclonal and monospecific 

antibodies can also be in the form of immunoglobulin fragments, e.g., F(ab)'2 
or Fab fragments. The antibodies of the invention can be of any isotype, e.g., 
IgG or IgA, and polyclonal antibodies can be of a single isotype or can contain 
a mixture of isotypes. 
25 The antibodies of the invention, which can be raised to a polypeptide 

or polypeptide derivative of the invention, can be produced and identified using 
standard immunological assays, e.g., Western blot assays, dot blot assays, or 
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ELISA (see, e.g., Coligan et ah, Current Protocols in Immunology, John Wiley 
& Sons/ Inc., New York, NY, 1994). The antibodies can be used in diagnostic 
methods to detect the presence of Helicobacter antigens in a sample, such as a 
biological sample. The antibodies can also be used in affinity chromatography 
5 methods for purifying a polypeptide or polypeptide derivative of the invention. 
As is discussed further below, the antibodies can also be used in prophylactic 
and therapeutic passive immunization methods. 

Accordingly, a ninth aspect of the invention provides (i) a reagent for 
detecting the presence of Helicobacter in a biological sample that contains an 
10 antibody, polypeptide, or polypeptide derivative of the invention; and (ii) a 
diagnostic method for detecting the presence of Helicobacter in a biological 
sample, by contacting the biological sample with an antibody, a polypeptide, or 
a polypeptide derivative of the invention, so that an immune complex is 
formed, and detecting the complex as an indication of the presence of 
1 5 Helicobacter in the sample or the organism from which the sample was 

derived. The immune complex is formed between a component of the sample 
and the antibody, polypeptide, or polypeptide derivative, and that any unbound 
material can be removed prior to detecting the complex. A polypeptide reagent 
can be used for detecting the presence of znti-Helicobacter antibodies in a 
20 sample, e.g., a blood sample, while an antibody of the invention can be used for 
screening a sample, such as a gastric extract or biopsy sample, for the presence 
of Helicobacter polypeptides. 

For use in diagnostic methods, the reagent (e.g., the antibody, 
polypeptide, or polypeptide derivative of the invention) can be in a free-state or 
25 can be immobilized on a solid support, such as, for example, on the interior 

surface of a tube or on the surface, or within pores, of a bead. Immobilization 
can be achieved using direct or indirect means. Direct means include passive 



BNSDOCID: <WO 9843479A1_L> 



WO 98/43479 




PCT/US98/06421 



-41- 

adsorption (i.e., non-covalent binding) or covalent binding between the support 
and the reagent. By "indirect means" is meant that an anti-reagent compound 
that interacts with the reagent is first attached to the solid support. For 
example, if a polypeptide reagent is used, an antibody that binds to it can serve 
5 as an anti-reagent, provided that it binds to an epitope that is not involved in 
recognition of antibodies in biological samples. Indirect means can also 
employ a ligand-receptor system, for example, a molecule, such as a vitamin, 
can be grafted onto the polypeptide reagent and the corresponding receptor can 
be immobilized on the solid phase. This concept is illustrated by the well 

10 known biotin-streptavidin system. Alternatively, indirect means can be used, 
e.g., by adding to the reagent a peptide tail, chemically or by genetic 
engineering, and immobilizing the grafted or fused product by passive 
adsorption or covalent linkage of the peptide tail. 

According to a tenth aspect of the invention, there is provided a 

15 process for purifying, from a biological sample, a polypeptide or polypeptide 
derivative of the invention, which involves carrying out antibody-based affinity 
chromatography with the biological sample, wherein the antibody is a 
monospecific antibody of the invention. 

For use in a purification process of the invention, the antibody can be 

20 polyclonal or monospecific, and preferably is of the IgG type. Purified IgGs 
can be prepared from an antiserum using standard methods (see, e.g., Coligan 
et aL, supra). Conventional chromatography supports, as well as standard 
methods for grafting antibodies, are described, for example, by Harlow et aL 
(Antibodies: A Laboratory Manual, Cold Spring Harbor Laboratory Press, Cold 

25 Spring Harbor, New York, 1 988). 

Briefly, a biological sample, such as an H. pylori extract, preferably 
in a buffer solution, is applied to a chromatography material, which is, 
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preferably, equilibrated with the buffer used to dilute the biological sample, so 
that the polypeptide or polypeptide derivative of the invention (i.e., the antigen) 
is allowed to adsorb onto the material. The chromatography material, such as a 
gel or a resin coupled to an antibody of the invention, can be in batch form or in 

5 a column. The unbound components are washed off and the antigen is eluted 
with an appropriate elution buffer, such as a glycine buffer, a buffer containing 
a chaotropic agent, e.g., guanidine HC1, or a buffer having high salt 
concentration {e.g., 3 M MgCl 2 )* Eluted fractions are recovered and the 
presence of the antigen is detected, e.g., by measuring the absorbance at 280 

10 nm. 

An antibody of the invention can be screened for therapeutic efficacy 
as follows. According to an eleventh aspect of the invention, there is provided 
(i) a composition of matter containing a monospecific antibody of the 
invention, together with a diluent or carrier; (ii) a pharmaceutical composition 

15 containing a therapeutically or prophylactically effective amount of a 

monospecific antibody of the invention; and (iii) a method for treating or 
preventing Helicobacter {e.g., H. pylori, H.felis, H. mustelae, or H. heilmanii) 
infection, by administering a therapeutic or prophylactic amount of a 
monospecific antibody of the invention to an individual in need of such 

20 treatment. In addition, the eleventh aspect of the invention includes the use of a 
monospecific antibody of the invention in the preparation of a medicament for 
treating or preventing Helicobacter infection. 

The monospecific antibody can be polyclonal or monoclonal, and is, 
preferably, predominantly of the IgA isotype. In passive immunization 

25 methods, the antibody is administered to a mucosal surface of a mammal, e.g., 
the gastric mucosa, e.g., orally or intragastrically, optionally, in the presence of 
a bicarbonate buffer. Alternatively, systemic administration, not requiring a 
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bicarbonate buffer, can be carried out. A monospecific antibody of the 
invention can be administered as a single active agent or as a mixture with at 
least one additional monospecific antibody specific for a different Helicobacter 
polypeptide. The amount of antibody and the particular regimen used can be 
5 readily determined by one skilled in the art. For example, daily administration 
of about 100 to 1 ,000 mg of antibody over one week, or three doses per day of 
about 1 00 to 1 ,000 mg of antibody over two or three days, can be effective 
regimens for most purposes. 

Therapeutic or prophylactic efficacy can be evaluated using standard 

10 methods in the art, e.g., by measuring induction of a mucosal immune response 
or induction of protective and/or therapeutic immunity, using, e.g., the H. felis 
mouse model and the procedures described by Lee et al. (Eur. J. 
Gastroenterology & Hepatology 7:303, 1995) or Lee et al. (J. Infect. Dis. 
1 72: 1 61 , 1 995). Those skilled in the art will recognize that the H. felis strain of 

15 the model can be replaced with another Helicobacter strain. For example, the 
efficacy of polynucleotide molecules and polypeptides from H. pylori is, 
preferably, evaluated in a mouse model using an H. pylori strain. Protection 
can be determined by comparing the degree of Helicobacter infection in the 
gastric tissue assessed by, for example, urease activity, bacterial counts, or 

20 gastritis, to that of a control group. Protection is shown when infection is 

reduced by comparison to the control group. Such an evaluation can be made 
for polynucleotides, vaccine vectors, polypeptides, and polypeptide derivatives, 
as well as for antibodies of the invention. 

For example, various doses of an antibody of the invention can be 

25 administered to the gastric mucosa of mice previously challenged with an H. 
pylori strain, as described, e.g., by Lee et al, (supra). Then, after an 
appropriate period of time, the bacterial load of the mucosa can be estimated by 
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assessing urease activity, as compared to a control. Reduced urease activity 
indicates that the antibody is therapeutically effective. 

Adjuvants that can be used in any of the vaccine compositions 
described above are described as follows. Adjuvants for parenteral 
5 administration include, for example, aluminum compounds, such as aluminum 
hydroxide, aluminum phosphate, and aluminum hydroxy phosphate. The 
antigen can be precipitated with, or adsorbed onto, the aluminum compound 
using standard methods. Other adjuvants, such as RIBI (ImmunoChem, 
Hamilton, MT), can also be used in parenteral administration. 
10 Adjuvants that can be used for mucosal administration include, for 

example, bacterial toxins, e.g., the cholera toxin (CT), the E. coli heat-labile 
toxin (LT), the Clostridium difficile toxin A, the pertussis toxin (PT), and 
combinations, subunits, toxoids, or mutants thereof. For example, a purified 
preparation of native cholera toxin subunit B (CTB) can be used. Fragments, 
15 homologs, derivatives, and fusions to any of these toxins can also be used, 
provided that they retain adjuvant activity. Preferably, a mutant having 
reduced toxicity is used. Suitable mutants are described, e.g., in WO 95/1721 1 
(Arg-7-Lys CT mutant), WO 96/6627 (Arg-192-Gly LT mutant), and WO 
95/34323 (Arg-9-Lys and Glu-129-Gly PT mutant). Additional LT mutants 
20 that can be used in the methods and compositions of the invention include, e.g., 
Ser-63-Lys, Ala-69-Gly, Glu-1 10- Asp, and Glu-1 12-Asp mutants. Other 
adjuvants, such as the bacterial monophosphoryl lipid A (MPLA) of, e.g., E, 
coli, Salmonella minnesota f Salmonella typhimurium, or Shigella flexneri; 
saponins, and polylactide glycolide (PLGA) microspheres, can also be used in 
25 mucosal administration. Adjuvants useful for both mucosal and parenteral 
administrations, such as polyphosphazene (WO 95/2415), can also be used. 
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Any pharmaceutical composition of the invention, containing a 
polynucleotide, polypeptide, polypeptide derivative, or antibody of the 
invention, can be manufactured using standard methods. It can be formulated 
with a pharmaceutically acceptable diluent or carrier, e.g., water or a saline 
5 solution, such as PBS, optionally, including a bicarbonate salt, such as sodium 
bicarbonate, e.g., 0.1 to 0.5 M. Bicarbonate can advantageously be added to 
compositions intended for oral or intragastric administration. In general, a 
diluent or carrier can be selected on the basis of the mode and route of 
administration, and standard pharmaceutical practice. Suitable pharmaceutical 

10 carriers and diluents, as well as pharmaceutical necessities for their use in 
pharmaceutical formulations, are described in Remington's Pharmaceutical 
Sciences, a standard reference text in this field and in the USP/NF. 

The invention also includes methods in which gastroduodenal 
infections, such as Helicobacter infection, are treated by oral administration of 

15 a Helicobacter polypeptide of the invention and a mucosal adjuvant, in 
combination with an antibiotic, an antisecretory agent, a bismuth salt, an 
antacid, sucralfate, or a combination thereof. Examples of such compounds 
that can be administered with the vaccine antigen and an adjuvant are 
antibiotics, including, e.g., macrolides, tetracyclines, p-lactams, 

20 aminoglycosides, quinolones, penicillins, and derivatives thereof (specific 
examples of antibiotics that can be used in the invention include, e.g., 
amoxicillin, clarithromycin, tetracycline, metronidizole, erythromycin, 
cefuroxime, and erythromycin); antisecretory agents, including, e.g., H 2 - 
receptor antagonists (e.g., cimetidine, ranitidine, famotidine, nizatidine; and 

25 roxatidine), proton pump inhibitors (e.g., omeprazole, lansoprazole, and 
pantoprazole), prostaglandin analogs (e.g., misoprostil and enprostil), and 
anticholinergic agents (e.g., pirenzepine, telenzepine, carbenoxolone, and 
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proglurnide); and bismuth salts, including colloidal bismuth subcitrate, 
tripotassium dicitrate bismuthate, bismuth subsalicylate, bicitropeptide, and 
pepto-bismol (see, e.g., Goodwin et al, Helicobacter pylori, Biology and 
Clinical Practice, CRC Press, Boca Raton, FL, pp 366-395, 1993; Physicians' 
5 Desk Reference, 49 th edn., Medical Economics Data Production Company, 
Montvale, New Jersey, 1995). In addition, compounds containing more than 
one of the above-listed components coupled together, e.g., ranitidine coupled to 
bismuth subcitrate, can be used. The invention also includes compositions for 
carrying out these methods, i.e., compositions containing a Helicobacter 
10 antigen (or antigens) of the invention, an adjuvant, and one or more of the 
above-listed compounds, in a pharmaceutically acceptable carrier or diluent. 

Amounts of the above-listed compounds used in the methods and 
compositions of the invention can readily be determined by one skilled in the 
art. In addition, one skilled in the art can readily design 
1 5 treatment/immunization schedules. For example, the non- vaccine components 
can be administered on days 1-14, and the vaccine antigen + adjuvant can be 
administered on days 1, 14, 21, and 28. 

Methods and pharmaceutical compositions of the invention can be 
used to treat or to prevent Helicobacter infections and, accordingly, 
20 gastroduodenal diseases associated with these infections, including acute, 
chronic, and atrophic gastritis, and peptic ulcer diseases, e.g., gastric and 
duodenal ulcers. 

A 76 kDa protein band containing GHPO 386, GHPO 789, and 
GHPO 1516 (hereinafter the "purified 76 kDa proteins"), GHPO 1 360, and 
25 GHPO 750 were purified from Helicobacter pylori strain ATCC number 43579 
(American Type Culture Collection, Rockville, Maryland) by immunoaffmity- 
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based chromatography using the methods described below in Example 1 , and 
were shown to be effective vaccine antigens as follows. 

Groups of 1 0 mice each were orally immunized with 1 , 5, or 25 |ng of 
the purified 76 kDa proteins, purified GHPO 1360, or purified GHPO 750 in 
5 combination with 5 jag of the heat-labile enterotoxin (LT) of E. coli. Twenty 
five jag of recombinant urease, in combination with 5 jig LT, was used as a 
positive control, and 5 ng of LT in PBS was used as a negative control. The 
immunizations were carried out four times each, on days 0, 7, 14, and 21 of the 
experiment. On day 33, blood samples were collected from the mice and, on 

10 day 34, saliva samples were collected. On day 35, all of the mice were 

challenged by intragastric administration of 1 x 10 7 streptomycin-resistant, 
mouse-adapted H. pylori. On day 49, additional saliva samples were collected 
and, about two weeks after challenge, on days 52-53, the mice were sacrificed. 
Stomachs were removed from the mice and were analyzed for Helicobacter 

15 infection by measuring urease activity in the intact stomach tissue and by a 
quantitative culture study (Table 1). 

Briefly, these studies showed that the gastric urease activities in 
samples from mice immunized with all three amounts of the purified 76 kDa 
proteins (i.e., 1, 5, and 25 jitg), in combination with LT, were generally lower 

20 than the gastric urease activities of samples from mice immunized with LT 

alone or mice that were not treated prior to challenge. Levels of gastric urease 
activity generally decreased with increasing amounts of the protein 
administered, with the gastric urease activity levels for the 25 jig doses 
generally approaching those of mice immunized with 25 jig of recombinant 

25 urease and LT. 

The quantitative culture analyses showed that the levels of 
Helicobacter detected in the stomachs of mice immunized with the purified 76 
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kDa proteins, purified GHPO 1360, or purified GHPO 750, which generally 
decreased with "increasing" dosages, were less than the levels detected in the 
stomachs of control mice that were immunized with LT alone or untreated 
before Helicobacter challenge (Tables 1 and 2). The percentages of mice 
5 protected by immunization with the purified 76 kDa proteins, purified GHPO 
1360, or purified GHPO 750 met or approached the percentages of mice 
protected by treatment with urease (Tables 1 and 2). These results show that 
the purified 76 kDa proteins, GHPO 1360, and GHPO 750 are effective vaccine 
antigens for use in preventing Helicobacter infection. 
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Table 1 



Prophylactic Immunization with PMsv Antigens as 
Oral Dose Response Against H. pylori Challenge 


Treatment 


BALB/c mice 
# mice infected 
Antrum 
(based on quantitative 
A 550 , 0.148 O.D. 
cutoff) 


Fisher's exact test 
infection status (based 
on quantitative A sso 
ratios, treatment 
group v. LT only 
(group 1 1 )) p-value 


CFU/ml (1/4 antrum) 
Mean ± SD 


Wilcoxon rank sums 
test 

CFU treatment group 
v. LT only control 
(group 1 1 ) 
p-value 


1 ug 50 kDa + LT 


60% (6/10) 


0.3034 


30825 ± 23210 


0.1736 


5 ug 50 kDa + LT 


40% (4/10) 


0.0573 


18910* 16341 


0.0588 


25 ug 50 kDa + LT 


30% (3/10) 


0.0198 


22710 ± 32397 


0.0821 


1 ug 32 kDa + LT 


50% (5/10) 


0.1409 


44225 ± 87824 


0.0756 


5 ug 50 kDa + LT 


10% (1/10) 


0.001 1 


1 181 1 ± 11579 


0.0191 


25 ug 50 kDa + LT 


0 (0/9) 


0.0001 


1608 ±23917 


0.01 14 


25 \xg rUre + LT 


0 (0/9) 


0.0001 


8208 ± 8021 


0.0179 


LT 


90% (9/10) 
90% (9/10) 


not determined 


1 07340 ± 127949 
46173 ±42325 


0.2568 
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Table 2 



5 



Prophylactic Immunization with Pfl 
Oral Dose Response Against H. py 


►Isv Antigens as 
)lori Challenge 


Treatment 


BALB/c mice 
# mice infected 
Antrum 
(based on quantitative 
A 5$0 , 0.148 O.D. 
cutoff) 


Fisher's exact test 
infection status (based 
on quantitative A 550 
ratios, treatment 
group v. LT only 
(group ll))p-value 


CFU/ml (1/4 antrum) 
Mean ± SD 


Wilcoxon rank sums 
test 

CFU treatment group 
v. LT only control 
(group 1 1 ) 
p-value 


1 ug 76 kDa + LT 


56% (5/9) 


0.1409 


39922 ± 34708 


0.2203 


5 ug 76 kDa + LT 


80% (4/5) 


1 


8802 ± 7788 


0.0864 


25 76 kDa + LT 


33% (3/9) 


0.0198 


9712± 12183 


0.0178 


25 ug rUre + LT 


0 (0/9) 


0.0001 


8208 ± 8021 


0.0179 


LT 


90% (9/10) 
90% (9/10) 


not determined 


1 07340 ± 127949 
46173 ±42325 


0.2568 



[0 The invention is further illustrated by the following examples. 

Example 1 describes purification of GHPO 1516 (76 kDa), GHPO 1360 (32 
kDa), and GHPO 750 (50 kDa) from Helicobacter cultures. Example 2 
describes identification of genes, e.g., genes encoding 76 kDa proteins, such as 
GHPO 386, GHPO 789, GHPO 1516, GHPO 1 197, GHPO 1 180, GHPO 896, 
15 GHPO 711, GHPO 190, GHPO 185, GHPO 1417, and GHPO 1414, a 32 kDa 
protein (GHPO 1360), and a 50 kDa protein (GHPO 750) in the Helicobacter 
genome, as well as identification of signal sequences, and primer design for 
amplification of genes lacking signal sequences. Example 3 describes cloning 
of DNA encoding GHPO 386, GHPO 789, GHPO 1516, GHPO 896, GHPO 
20 1360, and GHPO 750 into a vector that provides a histidine tag, and production 
and purification of the resulting his-tagged fusion proteins. Example 4 
describes methods for cloning DNA encoding the polypeptides of the invention 
so that they can be produced without His-tags, Example 5 describes methods 
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for purifying recombinant polypeptides of the invention, and Example 6 
describes use of the GHPO 1 360 polypeptide as a serodiagnostic tool for H. 
{ pylori infection 

EXAMPLE 1: Purification and partial sequence analysis of GHPO 1516 
5 (76 kDa), GHPO 1360 (32 kDa), and GHPO 750 (50 kDa) protein from 
Helicobacter pylori 

-1.A. Culture and initial purification steps 

Frozen seeds from H. pylori strain ATCC 43579 are used to seed a 
75 cm 2 flask containing a biphasic medium (a solid phase made of Colombia 
1 0 gelose containing 6% fresh sheep blood and a liquid phase made of triptcase 
soja containing 20% fetal calf serum). After 24 hours of culturing under 
microaerophilic conditions, the liquid phase is used for seeding several 75 cm 2 
flasks containing biphasic medium lacking sheep blood. After 24 hours of 
culture, the liquid phase is used to seed a 2 L biofermentor in triptcase soja 
1 5 liquid phase containing 10 g/L beta-cyclodextrine. At OD 1 .5- 1 .8, this culture 
is diluted in a 10 L biofermentor containing the liquid medium. After 24 hours, 
the bacteria are spun in a centrifuge at 4,000 x g for 30 minutes at 4°C. A 1 0 L 
culture contains about 20 to 30 g (wet weight) bacteria. 

The pellet obtained using the method described above is washed with 
20 500 ml PBS (7.650 g NaCl, 0.724 g disodium phosphate, and 0.210 g 

monopotassium phosphate for one liter (pH 7.2)) for a one liter culture. The 
bacteria are then spun in a centrifuge again under the same conditions. - 

The pellet (CI) is suspended in 1% N-octyl-D-glucopyranoside 
(NOG; 30 ml/L; Sigma). The bacterial suspension is incubated for 1 hour at 
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room temperature while stirring, spun in a centrifuge at 17,600 x g for 
30 minutes at 4°C, and the pellet (C2) is recovered. 

The supernatant (S2) is dialyzed against PBS overnight at 4°C while 
stirring. The precipitate is recovered by centrifugation at 2,600 x g for 
5 30 minutes at 4°C. The supernatant (S2d) is discarded and the pellet (Cs2d) is 
recovered and stored at -20°C. 

The pellet (C2) is resuspended in 20 mM Tris-HCl buffer (pH 7.5) 

and 

100 |uM Pefabloc (Buffer A), and is homogenized with an ultra-turrax (3821, 
10 Janke and Kungel). Lysozyme and EDTA are added at 0.1 mg/ml and 1 mM, 
respectively. 

The homogenate is sonicated three times for 2 minutes each at 4°C, 
and then is spun in an ultracentrifuge at 210,000 x g for 30 minutes at 4°C. The 
supernatant (S3), which contains the cytoplasmic and periplasmic proteins, is 

15 eliminated, while the pellet is recovered, washed with buffer A, and spun in an 
ultracentrifuge at 210,000 x g for 30 minutes at 4°C. The supernatant (S4) is 
eliminated and the pellet (C4) is stored at -20°C. This pellet (C4) contains 
membrane proteins. 

The pellet (C4) is washed in 50 mM NaC0 3 (pH 9.5) and 100 ^M 

20 Pefabloc (buffer B). The suspension is spun in an ultracentrifuge at 

2 10,000 x g for 30 minutes at 4°C. The supernatant (S5) is eliminated, and the 
pellet (C5) is then washed and spun in an ultracentrifuge as is described above. 
The supernatant (S6) is eliminated and the pellet (C6) is stored at -20°C. 
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l.B. Purification of the proteins of membrane fraction C4 by preparative 
SDS-PAGE 

SDS-PAGE is carried out according to the method of Laemmli 

{supra), using a biphasic gel consisting of a 5% polyacrylamide concentrating 
5 gel and a 10% polyacrylamide separating gel. The membrane fraction C4 is 

resuspended in buffer A, diluted in an equal volume of 2x sample buffer, and 

heated for 5 minutes at 95°C. About 1 9 mg of protein is applied to the gel 

(1 6 x 12 cm; 5 mm thick). Pre-migration is carried out for 2 hours at 50 V, and 

is followed by migration overnight at 65 V. After Coomassie blue staining, 
1 0 five major bands are revealed that have apparent molecular weights of 87, 76, 

54, 50, and 32 kDa. Bands at 50 and 32 kDa appear to be slightly contaminated 

with bands at 47 and 35 kDa, respectively. 

A band corresponding to the purified 76 kDa proteins, 32 kDa 

protein (GHPO 1360), or 50 kDa protein (GHPO 750) is cut out from the gel 
1 5 and is pounded with an ultra-turrax in 10-20 ml extraction buffer (25 mM Tris- 

HC1 (pH 8.8), 8 M urea, 10% SDS, 100 (iM phenyl methyl sulfonyl fluoride 

(PMSF), and 10 |iM Pefabloc (buffer Q). 

Each homogenate is filtered through a Millipore AP20 filter under 

7 bars at room temperature, washed with 5-10 ml buffer C, and then filtered 
20 again. Each filtrate is precipitated with three volumes of a 50/50 mixture of 

75% methanol and 75% isopropanol, and then is spun in a centrifuge at 

240,000 x g for 16 hours at 10°C. 

Each pellet is resuspended in 2 ml of 10 mM NaP0 4 (pH 7.0) 

containing 1 M NaCl, 0.1% Sarkosyl, 100 \iM PMSF, and 6 M urea (buffer D). 
25 The solubilized sample is dialyzed, in order, against 100 ml buffer D containing 

4 M urea, 100 ml buffer D containing 2 M urea and 0.5% Sarkosyl, and twice 

against 100 ml buffer D that does not contain urea or Sarkosyl. The dialyses 
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are carried out for 1 hour each while stirring at room temperature. The last 
dialysate is incubated for 30 minutes in an ice bath, and then is spun in a 
centrifuge at low speed for 10 minutes at 4°C. The supernatant is recovered, 
filtered through a Millipore filter (0.45 ^m), and stored at -20°C. 

5 l.C. Purification of the 76 kDa, 32 kDa, or 50 kDa protein by 
immunoaffinity-based chromatography 
l.C.l. Antiserum preparation 

Specific polyclonal serum against the purified 76 kDa proteins, the 
32 kDa protein (GHPO 1360), or the 50 kDa protein (GHPO 750), which are 

10 purified by preparative SDS-PAGE, is prepared by hyperimmunizing rabbits as 
follows. On day 0, a preparation containing 50 fig of the protein mixed with 
complete Freund's adjuvant is administered subcutaneously to the rabbits at 
multiple sites. The rabbits are boosted at days 21 and 42 with 25 |Ug of the 
protein in incomplete Freund's adjuvant, and are sacrificed at day 60. 

1 5 Complement is removed from the serum by heating for 30 minutes at 56°C. 
The hyperimmune serum is then sterilized by filtration through a Millipore 
membrane (0.22 |nm). 

I.C.2. IgG purification 

The hyperimmune serum prepared as described above is applied to a 
20 Protein A Sepharose Fast Flow column (Pharmacia) that is equilibrated with 

100 mM Tris-HCl (pH 8.0). The column is washed with 10 column volumes of 
100 mM Tris-HCl (pH 8.0), and then with 10 column volumes of 10 mM Tris- 
HCl (pH 8.0). lgGs are eluted in 0.1 M glycine buffer (pH 3.0), and arc 
collected as 5 ml fractions, to each of which 0.25 ml of Tris-HCl (pH 8.0) is 
25 added. The optical density of each fraction is measured at 280 nm, the IgG- 
containing fractions are pooled together and, if necessary, frozen at -70°C. 
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1.C.3, Preparation of the column 

An appropriate amount of CNBr-activated Sepharose 4B gel 
(Pharmacia; reference: 17-0430-01) is suspended in 1 mM NaCl buffer (1 g dry 
gel provides for 3.5 ml hydrated gel; 5 to 10 mg IgGs can be retained per ml of 
5 hydrated gel). The gel is then washed using a buchner by adding small 

quantities of 1 mM HC1. The total volume of 1 mM HC1 that is used amounts 
to 200 ml/g of gel. 

Purified IgGs are dialyzed for 4 hours at room temperature against 
50 volumes of 500 mM sodium phosphate buffer (pH 7.5). The IgGs are then 
10 diluted to 3 mg/ml with the same buffer. IgGs are incubated with the gel 
overnight at 5±3°C while stirring. The gel is packed in a chromatography 
column and is washed with 2 column volumes of 500 mM phosphate buffer 
(pH 7.5). The gel is then transferred to a tube and is incubated with 100 mM 
ethanolamine (pH 7.5), and then it is washed with 2 column volumes of PBS. 
15 The gel can be stored in PBS/merthiolate, 1/10,000. 

l.C. 4. Adsorption and elution 

The 76 kDa protein is adsorbed and eluted as follows. The 
membrane fraction Cs2d is suspended in 50 mM Tris-HCl (pH 8.0), 2 mM 
EDTA, and then is filtered through a 0.45 ^im membrane. The supernatant is 

20 applied to the column, which is equilibrated with 50 mM Tris-HCl (pH 8.0), 
2 mM EDTA, at a flow rate of about 10 ml/hour. The column is washed with 
20 column volumes of 50 mM Tris-HCl (pH 8.0), 2 mM EDTA, and then with 
2 to 6 volumes 10 mM phosphate buffer (pH 6.8). 

The antigen is eluted with 100 mM glycine buffer (pH 2.5). The 

25 eluate is collected in 3 ml fractions, to each of which is added 150 (il 1 M 

phosphate buffer (pH 8.0). The optical density of each fraction is measured at 
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280 nm, fractions containing the 76 kDa protein are pooled, and stored at -70°C. 

Analysis by 1 0% SDS-PAGE reveals a single band at 76 kDa. N- 
terminal sequence was carried out on this purified 76 kDa preparation, and the 
sequence obtained is as follows: EDDGFYTSVGYQIGEAAQMV (SEQ ID 
5 NO:58). 

The 32 kDa protein (GHPO 1360) or the 50 kDa protein (GHPO 
750) is purified by immunoaffmity-based chromatography as follows. In order 
to separate the 32 or 50 kDa protein from the contaminating proteins (the 47 
and 35 kDa proteins, respectively), membrane fraction C4 is solubilized in 50 

10 mM NaC0 3 (pH 9.5) for 30 minutes at room temperature under stirring and the 
preparation is centrifuged for 30 minutes at 200,000 x g at 4°C. The 47 and 35 
kDa proteins are insoluble in the NaC0 3 buffer and are eliminated in the pellet. 

The supernatant is dialyzed against 50 mM Tris-HCL (pH 8.0), 2 
mM EDTA, and then is filtered through a 0.45 urn membrane. The filtered 

1 5 supernatant is applied to the column, which is equilibrated with 50 mM Tris- 
HCL (pH 8.0), 2 mM EDTA, at a flow rate of about 10 ml/hour. The column is 
washed with 20 column volumes of 50 mM Tris-HCL (pH 8.0), 2 mM EDTA, 
and then with 2 to 6 volumes of 10 mM phosphate buffer (pH 6.8). 

The antigen is eluted with 100 mM glycine buffer (pH 2.5). The 

20 eluate is collected in 3 ml fractions, to each of which is added 1 50 ul 1 M 

phosphate buffer (pH 8.0). The optical density of each fraction is measured at 
280 nm, and fractions containing the 50 or 32 kDa protein are pooled and 
stored at -70°C. 

Analysis of the purified protein by 10% SDS-PAGE reveals single 
25 bands at 50 and 32 kDa. N-terminal sequencing is carried out with the purified 
50 kDa protein preparation. The sequence found is as follows: 
MKEKFNRTKPHVNIGTIGHVDH (SEQ ID NO:73). Similarly, N-terminal 
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and internal sequencing is carried out with the purified 32 kDa preparation. 
The sequences found are as follows: AHNANNATHNTKK (SEQ ID NO:74) 
and KPAHNA (SEQ ID NO:75) (N-terminal), and IDKQPKAKK (SEQ ID 
NO:76) and FWAKKQAE (SEQ ID NO:77) (internal). 

5 l.D. Purification of the 76 kDa protein from membrane fraction Cs2d and 
purification of the 32 kDa and 50 kDa proteins from membrane fraction 
C4 

The 76 kDa protein can also be purified as follows. A 40 ml Q- 
Sepharose column (diameter: 2.5 cm; height: 8 cm) is prepared according to the 

10 manufacturer's instructions (Pharmacia). The column is washed and 
equilibrated with buffer B, containing 50 mM NaCO, (pH 9.5) 3 100 \iM 
Pefabloc, and 0.1% Zwittergent 3-14. The chromatography is monitored by 
measuring absorbance at 280 nm at the column exit. 

One hundred and forty mg of protein from the membrane fraction 

15 Cs2d resuspended in buffer B are applied to the column. The column is 

washed with 0.1 M NaCl in buffer B, and then a 0.1-0.5 M NaCl gradient is 
applied to the column. The fraction eluted between 0.35 and 0.45 M NaCl is 
further purified on a 10 ml S-Sepharose column (diameter: 1.5 cm; height: 
5 cm; up to 10 mg protein/ml of gel), which is prepared according to the 

20 manufacturer's instructions (Pharmacia). The fraction obtained is dialyzed 
against 50 mM acetate (pH 5.0) containing 100 jiM Pefabloc and 
0.1% Zwittergent 3-14, and then is applied to the column, which is equilibrated 
with the acetate buffer. - - 

The column is washed with the acetate buffer until the absorbance at 

25 280 nm is stabilized (about 3 column volumes are required). Proteins are 
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eluted with a 0-0.5 M NaCl gradient in acetate buffer. The fraction eluted at 
0.15 M NaCl is enriched with the 76 kDa protein. 

The 32 kDa protein (GHPO 1360) can also be purified as follows. 
Membrane fraction C4 is solubilized in 50 mM NaC0 3 buffer (pH 9.5) at room 
5 temperature for 30 minutes under stirring. The suspension is then centrifuged 
at 200,000 x g for 30 minutes at 4°C. This allows the 32 and 35 kDa proteins 
to be separated, since the 35 kDa protein is insoluble in the NaC0 3 buffer. The 
supernatant is dialyzed against 50 mM NaP0 4 buffer (pH 7.0), and then is 
applied to an SP-Sepharose column, which is equilibrated with the NaP0 4 
10 buffer. The column is washed with the NaPO, buffer, and then an 0-0.5 M 

NaCl gradient is applied to the column. The fraction eluted between 0.26 and 
0.31 M contains the 32 kDa protein. 

The 50 kDa protein can also be purified as follows. Membrane fraction 
C4 is solubilized in 50 mM NaC0 3 buffer (pH 9.5) at room temperature for 
15 30 minutes while stirring. The suspension is then centrifuged at 200,000 x g 
for 30 minutes at 4°C. This allows the 50 and 47 kDa proteins to be separated, 
since the 47 kDa protein is insoluble in the NaC0 3 buffer. The supernatant is 
dialyzed against 50 mM NaP0 4 buffer (pH 7.0). 

A 40 ml Q-Sepharose column (diameter: 2.5 cm; height: 8 cm) is 
20 prepared according to the manufacturer's instructions (Pharmacia), washed, and 
equilibrated with buffer B (pH 9.5) (50 mM NaCO,, 100 \iM Pefabloc, and 
0.1% Zwittergent 3-14). 

The chromatography is monitored by UV detection at 280 nm at the 
column exit. One hundred and forty mg of protein solubilized as is described 
25 above are applied to the column, which is then washed with buffer B until the 
absorbance at 280 nm is stabilized. The proteins are eluted with a 0.1- 
0.5 M NaCl gradient in buffer B (10 fold V T ), which is followed by washing in 
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buffer B containing 0.5, and then 1, M NaCl (2 fold V T ). The fractions are 
recovered, analyzed by SDS-PAGE, and pooled according to their 
electrophoretic profiles. 

Fraction 9, which corresponds to the beginning of the washing at 1 M 
5 NaCl and contains acidic proteins, is further purified as follows. A 10 ml 
DEAE Sepharose column (diameter: 1.5 cm, height: 5 cm) is prepared 
according to the manufacturer's instructions (Pharmacia) (up to 10 mg 
protein/ml of gel). The column is washed and equilibrated with buffer B. 
Chromatography is monitored as is described above. 

10 Fraction 9 is dialyzed against buffer B and contains about 1 0 mg protein. 

Fraction 9 is applied to the DEAE-Sepharose column. The column is washed 
with buffer B until the absorbance at 280 nm is stabilized. The proteins are 
eluted with a 0-0.5 M NaCl gradient in buffer B (10 fold V T ), followed by 
washing in buffer B, containing 1 M NaCl (2 fold V T ). Fractions are recovered 

15 and analyzed by SDS-PAGE. The 50 kDa protein is found in the fractions 
eluted at 0.3-0.4 M NaCl. 

EXAMPLE 2: Identification of genes in the H. pylori genome, such as 
genes encoding the 76 kDa proteins, the 32 kDa protein (GHPO 1360), and 
the 50 kDa protein (GHPO 750) identification of signal sequences, and 
20 primer design for amplification of genes lacking signal sequences 

2.A. Creating H. pylori genomic databases 

The H. pylori genome was provided as a text file containing a single 
contiguous string of nucleotides that had been determined to be 1 .76 
Megabases in length. The complete genome was split into 17 separate files 
25 using the program SPLIT (Creativity in Action), giving rise to 16 contigs, each 
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containing 100,000 nucleotides, and a 17 th contig containing the remaining 
76,000 nucleotides. A header was added to each of the 1 7 files using the 
format: >hpg0.txt (representing contig 1), .hpgl.txt (representing contig 2), etc. 
The resulting 17 files, named hpgO through hpgl6, were then copied together to 
5 form one file that represented the plus strand of the complete //. pylori genome. 
The constructed database was given the designation "H " A negative strand 
database of the H. pylori genome was created similarly by first creating a 
reverse complement of the positive strand using the program SeqPup (D.G. 
Gilbert, Indiana University Biology Department) and then performing the same 

10 procedure as described above for the plus strand. This database was given the 
designation "N." 

The regions predicted to encode open reading frames (ORFs) were 
defined for the complete H. pylori genome using the program GENEMARK™ 
(Borodovsky et al, Comp. Chem. 17:123, 1993). A database was created from 

15 a text file containing an annotated version of all ORFs predicted to be encoded 
by the H. pylori genome for both the plus and minus strands, and was given the 
designation "O." Each ORF was assigned a number indicating its location on 
the genome and its position relative to other genes. No manipulation of the text 
file was required. 

20 2.B. Searching the H. pylori databases 

The databases constructed as is described above were searched using the 
program FASTA (Pearson et aL 9 Proc. Natl. Acad. Sci. USA 85:2444-2448, 
1988). FASTA was used for searching either a DNA sequence against either of 
the gene databases ("H" and/or "N"), or a peptide sequence against the ORF 
25 library ("O")- TFASTX was used to search a peptide sequence against all 
possible reading frames of a DNA database ("H" and/or "N" libraries). - 
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Potential frameshifts also being resolved, FASTX was used for searching the 
translated reading frames of a DNA sequence against either a DNA database, or 
a peptide sequence against the protein database. 

2.C. Isolation of DNA sequences from the H. pylori genome 

The FASTA searches against the constructed DNA databases identified 
exact nucleotide coordinates on one or more of the isolated contigs, and 
therefore the location of the target DNA. Once the exact location of the target 
sequence was known, the contig identified to carry the gene was exported into 
the software package MapDraw (DNAStar, Inc.) and the gene was isolated. 
Gene sequences with flanking DNA was then excised and copied into the 
EditSeq. Software package (DNAStar, Inc.) for further analysis. 

2.D. Identification of signal sequences 

The deduced protein encoded by a target gene sequence is analyzed 
using the PROTEAN software package (DNAStar, Inc.). This analysis predicts 
1 5 those areas of the protein that are hydrophobic by using the Kyte-Doolittle 
algorithm, and identifies any potential polar residues preceding the 
hydrophobic core region, which is typical for many signal sequences. For 
confirmation, the target protein is then searched against a PROSITE database 
(DNAStar, Inc.) consisting of motifs and signatures. Characteristic of many 
20 signal sequences and hydrophobic regions in general, is the identification of 
predicted prokaryotic lipid attachment sites. Where confirmation between the 
two approaches is apparent at the N-terminus of any protein, putative cleavage 
sites are sought. Specifically, this includes the presence of either an Alanine 
(A), Serine (S), or Glycine (G) residue immediately after the core hydrophobic 
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region. In the case of lipoproteins, a Cysteine (C) residue would be identified 
as the +1 residue, post-cleavage. 

2.E. Rational design of PCR primers based on the identification of signal 
sequences 

5 In order to clone gene sequences as N-terminus translational fusions for 

the generation of recombinant proteins with N-terminal Histidine tags, the gene 
sequence that specifies the signal sequence is omitted. The 5-end of the gene- 
specific portion of the N-terminal primer is designed to start at the first codon 
beyond the cleavage site. In the case of lipoproteins, the 5'-end of the N- 

1 0 terminal primer begins at the second codon, immediately after the modifiable 
residue at position +1 post-cleavage. The omission of the signal sequence from 
the recombinant allows for one-step purification, and potential problems 
associated with insertion of signal sequences in the membrane of the host strain 
carrying the hybrid construct are avoided. 

15 EXAMPLE 3: Preparation of isolated DNA encoding GHPO 386, GHPO 
789, GHPO 1516, GHPO 896, GHPO 1360, and GHPO 750, and 
production of these proteins as a histidine-tagged fusion proteins 

3.A. Preparation of genomic DNA from Helicobacter pylori 

Helicobacter pylori strain ORV2001, stored in LB medium containing 
20 50% glycerol at -70°C, is grown on Colombia agar containing 7% sheep blood 
for 48 hours under microaerophilic conditions (8-10% C0 2 , 5-7% 0 2 , and 85- 
87% N 2 ). Cells are harvested, washed with PBS (pH 7.2), and DNA is then 
extracted from the cells using the Rapid Prep Genomic DNA Isolation kit 
(Pharmacia Biotech). 
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3.B. PCR amplification 

DNA encoding GHPO 386, GHPO 789, GHPO 1516, GHPO 896, 
GHPO 1360, and GHPO odd numbers), 65, and 67 is amplified from genomic 
DNA, as can be prepared as is described above, by the Polymerase Chain 
5 Reaction (PCR) using the following primers: 
GHPO 386 : 

N-terminal primer: 

5'-CTGAATTCGATTTCAAGGAGAAAACATGAAA-3 f (SEQ ID NO:59); 
and 

10 C-terminal primer: 

S'-CCGCTCGAGTTAGTAAGCGAACACATAATT-S' (SEQ ID NO:60). 
GHPO 789 : 

N-terminal primer: 

S'-CGCGGATCCGAATCCAATTTAATCCAAAAAGG-S' (SEQ ID NO:61); 
15 and 

C-terminal primer: 

5 , -CCGCTCGAGTTAGTAAGCGAACACATAGTTCAA-3 , (SEQ ID NO:62). 
GHPO 1516 : 

N-terminal primer: 

20 5 , -CGCGGATCCGAATCCAATTTAATCCAAAAAGG-3 , (SEQ ID NO:56); 
and 

C-terminal primer: 

5 CCGCTCG AGTT A AGT A AG CG A AC AC AT ATTC A A-3 1 (SEQ ID NO:57). 
GHPO 896 : 
25 N-terminal primer: 

5 CGCGG ATCCG A AGTTTCTTTGT ATC A A AG-3 1 (SEQ ID NO:63); and 
C-terminal primer: 
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5'-CCGCTCGAGTTAGTAAGCAAACACATAATTGTG-3' (SEQ ID NO:64). 
GHPO 1360 : 

N-terminal primer: 

5 ! -CGCGGATCCGAATGAAAAAAAATATCTTAAAT-3' (SEQ ID NO:69); 
5 and 

C-terminal primer: 
5 f -CCGCTCGAGTTACTTGTTGATAACAATTTT-3 ! (SEQ ID NO:70). 
GHPO 750 : 

N-terminal primer: 

10 5'-CGCGGATCCGAATGGCAAAAGAAAAGTTTAAC-3 ! (SEQ ID NO:71); 
and 

C-terminal primer: 
5 f -CCGCTCGAGTTATTCAATAATATTGCTCAC-3 f (SEQ ID NO:72). 
GHPO 711 : 
1 5 N-terminal primer: 

5'-GGGAATTCAAAAAAACGAAAAAAACG-3' (SEQ ID NO:83); and 

C-terminal primer: 
S'-CCCCTCGAGTTAATAGGCAAACAC-S' (SEQ ID NO:84). 

The N-terminal and C-terminal primers for each clone both include a 5' 
20 clamp and a restriction enzyme recognition sequence for cloning purposes 
(BamHI (GGATCC) and Xhol (CTCGAG) recognition sequences). 

Amplification of gene-specific DNA is carried out using a heat-stable 
DNA Polymerase (e.g., Thermalase DNA Polymerase (Amresco)) according to 
the manufacturer's instructions. The reaction mixture, which is brought to a 
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final volume of 100 jil with distilled water, is as follows: 



dNTPs mix 200 uM 

1 Ox ThermoPol buffer 1 0 ul 

5 primers 300 nM each 

DNA template 50 ng 

DNA polymerase 2 units 



Appropriate amplification reaction conditions can readily be determined 

1 0 by one skilled in the art. In the present case, the following conditions were 
used. For GHPO 386 and GHPO 789, in a reaction containing Taq DNA 
polymerase (Appligene), a denaturing step was carried out at 95 °C for 30 
seconds, followed by an annealing step at 50 °C for one minute, and an 
extension step at 72 °C for 2 minutes and 30 seconds. Twenty five cycles were 

1 5 carried out. For GHPO 896, in a reaction containing Taq DNA polymerase, a 
denaturing step was carried out at 97 °C for 30 seconds, followed by an 
annealing step at 50 °C for one minute, and an extension step at 72 °C for 2 
minutes and 30 seconds. Twenty five cycles were carried out. The same 
reaction conditions were used for GHPO 1516 as GHPO 896, except that Vent 

20 DNA polymerase was used for clone GHPO 1516, instead of Taq DNA 

polymerase, and the annealing temperature was 55 °C. For GHPO 1360 and 
GHPO 750, Thermalase DNA polymerase was used. A denaturing step was 
carried out at 95 °C for 30 seconds, followed by an annealing step at 55 °C for 
one minute, and an extension step at 72 °C for 2 minutes. Thirty cycles were 

25 carried out. For GHPO 71 1, Vent DNA polymerase was used. A denaturing 
step was carried out at 94°C for 30 seconds, followed by an annealing step at 
50 °C for 30 seconds, and an extension step at 72 °C for 1 minute. Twenty five 
cycles were carried out. 
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3.C. Transformation and selection of transformants 

A single PCR product is thus amplified and is then digested at 37 °C for 
2 hours with BamHl and Xhol concurrently in a 20 \x\ reaction volume. The 
5 digested product is ligated to similarly cleaved pET28a (Novagen) that is 
dephosphorylated prior to the ligation by treatment with Calf Intestinal 
Alkaline Phosphatase (CIP). The gene fusion constructed in this manner allows 
one-step affinity purification of the resulting fusion protein because of the 
presence of histidine residues at the N-terminus of the fusion protein, which are 

1 0 encoded by the vector. 

The ligation reaction (20 |il) is carried out at 14 °C overnight and then is 
used to transform 100 fil fresh E. coli XL 1 -blue competent cells (Novagen). 
The cells are incubated on ice for 2 hours, then heat-shocked at 42 °C for 
30 seconds, and returned to ice for 90 seconds. The samples are then added to 

15 1 ml LB broth in the absence of selection and grown at 37 °C for 2 hours. The 
cells are then plated out on LB agar containing kanamycin (50 |ng/ml) at a lOx 
and neat dilution and incubated overnight at 37 °C. The following day, 50 
colonies are picked onto secondary plates and incubated at 37 °C overnight. 
Five colonies are picked into 3 ml LB broth supplemented with 

20 kanamycin (100 |ig/ml) and are grown overnight at 37 °C. Plasmid DNA is 
extracted using the Quiagen mini-prep, method and is quantitated by agarose 
gel electrophoresis. 

PCR is performed with the gene-specific primers under the conditions 
stated above and transformant DNA is confirmed to contain the desired insert. 

25 If PCR-positive, one of the five plasmid DNA samples (500 ng) extracted from 
the E. coli XL 1 -blue cells is used to transform competent BL21 (A.DE3) E. coli 
competent cells (Novagen; as described previously). Transformants (10) are 
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picked onto selective kanamycin (50 |ng/ml) containing LB agar plates and 
stored as a research stock in LB containing 50% glycerol. 

3.D. Purification of recombinant proteins 

One ml of frozen glycerol stock prepared as described in 3.C. is used to 
5 inoculate 50 ml of LB medium containing 25 |ig/ml of kanamycin in a 250 ml 
Erlenmeyer flask. The flask is incubated at 37°C for 2 hours or until the 
absorbance at 600 nm (OD 600 ) reaches 0.4-1.0. The culture is stopped from 
growing by placing the flask at 4°C overnight. The following day, 10 ml of the 
overnight culture are used to inoculate 240 ml LB medium containing 
10 kanamycin (25 |ig/ml), with the initial OD 600 about 0.02-0.04. Four flasks are 
inoculated for each ORF. 

The cells are grown to an OD 600 of 1 .0 (about 2 hours at 37°C), a 1 ml 
sample is harvested by centrifugation, and the sample is analyzed by SDS- 
PAGE to detect any leaky expression. The remaining culture is induced with 1 
15 mM IPTG and the induced cultures are grown for an additional 2 hours at 
37°C. 

The final OD 600 is taken and the cells are harvested by centrifugation at 
5,000 x g for 15 minutes at 4°C. The supernatant is discarded and the pellets 
are resuspended in 50 mM Tris-HCl (pH 8.0), 2 mM EDTA. Two hundred and 
20 fifty ml of buffer are used for a 1 L culture and the cells are recovered by 

centrifugation at 12,000 x g for 20 minutes. The supernatant is discarded and 
the pellets are stored at -45°C. 

3. E. Protein purification 

Pellets obtained from 3.D. are thawed and resuspended in 95 ml of 50 
25 mM Tris-HCl (pH 8.0). Pefabloc and lysozyme are added to final 
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concentrations of 100 and 100 ng/ml, respectively. The mixture is 
homogenized with magnetic stirring at 5°C for 30 minutes. Benzonase (Merck) 
is added at a 1 U/ml final concentration, in the presence of 10 mM MgCl2, to 
ensure total digestion of the DNA. The suspension is sonicated (Branson 
5 Sonifier 450) for 3 cycles of 2 minutes each at maximum output. The 

homogenate is spun in a centrifuge at 19,000 x g for 15 minutes and both the 
supernatant and the pellet are analyzed by SDS-PAGE to detect the cellular 
location of the target protein in the soluble or insoluble fractions, as is 
described further below. 

10 3.E.I. Soluble fraction 

If the target protein is produced in a soluble form (i.e., in the supernatant 
obtained in 3.E.) NaCl and imidazole are added to the supernatant to final 
concentrations of 50 mM Tris-HCl (pH 8.0), 0.5 M NaCl, and 10 mM 
imidazole (buffer A). The mixture is filtered through a 0.45 |um membrane and 

15 loaded onto an IMAC column (Pharmacia HiTrap chelating Sepharose; 1 ml) 
that has been charged with nickel ions according to the manufacturer's 
recommendations. After loading, the column is washed with 50 column 
volumes of buffer A and the recombinant target protein is eluted with 5 ml of 
buffer B (50 mM Tris-HCl (pH 8.0), 0.5 M NaCl, 500 mM imidazole). 

20 The elution profile is monitored by measuring the absorbance of the 

fractions at 280 nm. Fractions corresponding to the protein peak are pooled, 
dialyzed against PBS containing 0.5 M arginine, filtered through a 0.22 |nm 
membrane, and stored at -45°C. 
3.E.2. Insoluble fraction 

25 If the target protein is expressed in the insoluble fraction (pellets 

obtained from 3.E.), purification is conducted under denaturing conditions. 
NaCl, imidazole, and urea are added to the resuspended pellet to final - 



BNSDOCID: <WO 9843479A1_t_> 



WO 98/43479 



PCT/US98/06421 



-69- 

concentrations of 50 mM Tris-HCl (pH 8.0), 0.5 M NaCl, 10 mM imidazole, 
and 6 M urea (buffer C). After complete solubilization, the mixture is filtered 
through a 0.45 jim membrane and loaded onto an IMAC column. 

The purification procedures on the IMAC column are the same as 
5 described in 3.E. 1 ., except that 6 M urea is included in all buffers used and 1 0 
column volumes of buffer C are used to wash the column after protein loading, 
instead of 50 column volumes. 

The protein fractions eluted from the IMAC column with buffer D 
(buffer C containing 500 mM imidazole) are pooled. Arginine is added to the 

10 solution to final concentration of 0.5 M and the mixture is dialyzed against PBS 
containing 0.5 M arginine and various concentrations of urea (4 M, 3 M, 2 M, 1 
M, and 0.5 M) to progressively decrease the concentration of urea. The final 
dialysate is filtered through a 0.22 \xm membrane and stored at -45°C. 

Alternatively, when the above purification process is not as efficient as it 

15 should be, two other processes may be used as follows. A first alternative 
involves the use of a mild denaturant, N-octyl glucoside (NOG). Briefly, a 
pellet obtained in 3.E. is homogenized in 5 mM imidazole, 500 mM sodium 
chloride, 20 mM Tris-HCl (pH 7.9) by microfluidization at a pressure of 15,000 
psi and is clarified by centrifugation at 4,000-5,000 x g. The pellet is 

20 recovered, resuspended in 50 mM NaP0 4 (pH 7.5) containing 1-2% weight 

/volume NOG, and homogenized. The NOG-soluble impurities are removed by 
centrifugation. The pellet is extracted once more by repeating the preceding 
extraction step. The pellet is dissolved in 8 M urea, 50 mM Tris (pH 8.0). The 
urea-solubilized protein is diluted with an equal volume of 2 M arginine, -50 

25 mM Tris (pH 8.0), and is dialyzed against 1 M arginine for 24-48 hours to 
remove the urea. The final dialysate is filtered through a 0.22 ^im membrane 
and stored at -45°C. 
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A second alternative involves the use of a strong denaturant, such as 
guanidine hydrochloride. Briefly, a pellet obtained in 3.E. is homogenized in 5 
mM imidazole, 500 mM sodium chloride, 20 mM Tris-HCl (pH 7.9) by 
microfluidization at a pressure of 15,000 psi and clarified by centrifugation at 

5 4,000-5,000 x g. The pellet is recovered, resuspended in 6 M guanidine 

hydrochloride, and passed through an IMAC column charged with Ni"^. The 
bound antigen is eluted with 8 M urea (pH 8.5). Beta-mercaptoethanol is added 
to the eluted protein to a final concentration of 1 mM, then the eluted protein is 
passed through a Sephadex G-25 column equilibrated in 0.1 M acetic acid. 

10 Protein eluted from the column is slowly added to 4 volumes of 50 mM 
phosphate buffer (pH 7.0). The protein remains in solution. 

3.F. Evaluation of the protective activity of the purified protein 

A protection test is described above that was carried out for testing the 
protective activity of the purified, native proteins. This test can also be used for 

15 testing the protective efficacy of recombinant proteins. Alternatively, the 
following test can be used. 

Groups of 10 OF 1 mice (IFF A Credo) are immunized rectally with 25 
jag of the purified recombinant protein, admixed with 1 \xg of cholera toxin 
(Berna) in physiological buffer. Mice are immunized on days 0, 7, 14, and 21 . 

20 Fourteen days after the last immunization, the mice are challenged with H. 
pylori strain ORV2001 grown in liquid media (the cells are grown on agar 
plates, as described in I.A., and, after harvest, the cells are resuspended in 
Brucella broth; the flasks are then incubated overnight at 37°C). Fourteen days 
after challenge, the mice are sacrificed and their stomachs are removed. The 

25 amount of H. pylori is determined by measuring the urease activity in the 
stomach and by culture. 
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3.G. Production of monospecific polyclonal antibodies 
3.G.I. Hyperimmune rabbit antiserum 

New Zealand rabbits are injected both subcutaneously and 
intramuscularly with 100 |ig of a purified fusion polypeptide, as obtained in 
5 3.E.1 . or 3.E.2., in the presence of Freund's complete adjuvant and in a total 
volume of approximately 2 ml. Twenty one and 42 days after the initial 
injection, booster doses, which are identical to priming doses, except that 
Freund's incomplete adjuvant is used, are administered in the same way. 
Fifteen days after the last injection, animal serum is recovered, 
10 decomplemented, and filtered through a 0.45 jum membrane. 
3.G.2. Mouse hyperimmune ascites fluid 

Ten mice are injected subcutaneously with 10-50 jig of a purified fusion 
polypeptide, as obtained in 3.E.I. or 3.E.2., in the presence of Freund's 
complete adjuvant and in a volume of approximately 200 Seven and 14 

15 days after the initial injection, booster doses, which are identical to the priming 
doses, except that Freund's incomplete adjuvant is used, are administered in the 
same way. Twenty one and 28 days after the initial infection, mice receive 50 
jag of the antigen alone intraperitoneally. On day 21, mice are also injected 
intraperitoneally with sarcoma 180/TG cells CM26684 (Lennette et al. 9 

20 Diagnostic Procedures for Viral, Rickettsial, and Chlamydial Infections,, 5th 
Ed., Washington DC, American Public Health Association, 1979). Ascites 
fluid is collected 10-13 days after the last injection. 

EXAMPLE 4: Methods for producing transcriptional fusions lacking His- 
tags 

25 Methods for amplification and cloning of DNA encoding the 

polypeptides of the invention as transcriptional fusions lacking His-tags are 
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described as follows. Two PCR primers for each clone are designed based 
upon the sequences of the polynucleotides that encode them (SEQ ID NOs:l-21 
(odd numbers), 65, and 67). These primers can be used to amplify DNA 
encoding the polypeptides of the invention from any Helicobacter pylori strain, 
5 including, for example, ORV2001 and the H. pylori strain deposited with the 
American Type Culture Collection (ATCC, Rockville, Maryland) as ATCC 
number 43579, as well as from other Helicobacter species. 

The N-terminal primers are designed to include the ribosome binding 
site of the target gene, the ATG start site, the signal sequence (if any), and the 

10 cleavage site. The N-terminal primers can include a 5* clamp and restriction 
endonuclease recognition site, such as that for BamHl (GGATCC), which 
facilitates subsequent cloning. Similarly, the C-terminal primers can include a 
restriction endonuclease recognition site, such as that for Xliol (CTCGAG), 
which can be used in subsequent cloning, and a TAA stop codon. Specific 

15 primers that can be used are listed above. 

Amplification of a genes encoding the polypeptides of the invention can 
be carried out using Vent DNA polymerase (New England Biolabs) or Taq 
DNA polymerase (Appligene) under the conditions described above in 
Example 3. Alternatively, Thermalase DNA polymerase or Pwo DNA 

20 polymerase (Boehringer Mannheim) can be used, according to instructions 
provided by the manufacturers. 

A single PCR product for each clone is amplified and can be cloned into 
BamHl-XhoI cleaved pET24, resulting in construction of transcriptional fusions 
that permit expression of the proteins without His-tags. The expressed products 

25 can be purified as denatured proteins that are refolded by dialysis into 1 M 
arginine. 
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Cloning into pET 24 allows transcription of genes from the T7 promoter, 
which is supplied by the vector, but relies upon binding of the RNA-specific 
DNA polymerase to the intrinsic ribosome binding site of the genes, and 
thereby expression of the complete ORF. The amplification, digestion, and 
5 cloning protocols are as described above for constructing translational fusions. 

EXAMPLE 5: Purification of the polypeptides of the invention by 
immunoaffinity 

5.A. Purification of specific IgGs 

An immune serum, as prepared as is described in section 3.G., is applied 
10 to a protein A Sepharose Fast Flow column (Pharmacia) equilibrated in 100 
mM Tris-HCl (pH 8.0). The resin is washed by applying 10 column volumes 
of 100 mM Tris-HCl and 10 volumes of 10 mM Tris-HCl (pH 8.0) to the 
column. IgG antibodies are eluted with 0. 1 M glycine buffer (pH 3.0) and are 
collected in 5 ml fractions to each of which is added 0.25 ml 1 M Tris-HCl 
15 (pH 8.0). The optical density of the eluate is measured at 280 nm and the 
fractions containing the IgG antibodies are pooled, dialyzed against 50 mM 
Tris-HCl (pH 8.0), and, if necessary, stored frozen at -70 °C. 

5.B. Preparation of the column 

An appropriate amount of CNBr-activated Sepharose 4B gel (1 g of 
20 dried gel provides for approximately 3.5 ml of hydrated gel; gel capacity is 

from 5 to 10 mg coupled IgG/ml of gel) manufactured by Pharmacia (17-0430- 
01) is suspended in 1 mM HC1 buffer and washed using a buchner by adding 
small quantities of 1 mM HC1 buffer. The total volume of buffer is 200 ml per 
gram of gel. 
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Purified IgG antibodies are dialyzed for 4 hours at 20±5°C against 
50 volumes of 500 mM sodium phosphate buffer (pH 7.5). The antibodies are 
then diluted in 500 mM phosphate buffer (pH 7.5) to a final concentration of 3 
mg/ml. 

5 IgG antibodies are mixed with the gel overnight at 5±3°C. The gel is 

packed into a chromatography column and is washed with 2 column volumes of 
500 mM phosphate buffer (pH 7.5), and 1 column volume of 50 mM sodium 
phosphate buffer, containing 500 mM NaCl (pH 7.5). The gel is then 
transferred to a tube, mixed with 100 mM ethanolamine (pH 7.5) for 4 hours at 

10 room temperature, and washed twice with 2 column volumes of PBS. The gel 
is then stored in 1/10,000 PBS/merthiolate. The amount of IgG antibodies 
coupled to the gel is determined by measuring the optical density (OD) at 280 
nm of the IgG solution and the direct eluate, plus washings. 

5.C. Adsorption and elution of the antigen 

15 An antigen solution in 50 mM Tris-HCl (pH 8.0), 2 mM EDTA, for 

example, the supernatant obtained in 3.E.1 . or the solubilized pellet obtained in 
3.E.2., after centrifiigation and filtration through a 0.45 pirn membrane, is 
applied to a column equilibrated with 50 mM Tris-HCl (pH 8.0), 2 mM EDTA, 
at a flow rate of about 10 ml/hour. The column is then washed with 

20 20 volumes of 50 mM Tris-HCl (pH 8.0), 2 mM EDTA. Alternatively, 
adsorption can be achieved by mixing overnight at 5±3°C. 

The adsorbed gel is washed with 2 to 6 volumes of 1 0 mM sodium 
phosphate buffer (pH 6.8) and the antigen is eluted with 100 mM glycine buffer 
(pH 2.5). The eluate is recovered in 3 ml fractions, to each of which is added 

25 1 50 jlxI of 1 M sodium phosphate buffer (pH 8.0). Absorption is measured at 
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280 nm for each fraction; those fractions containing the antigen are pooled and 
stored at 
-20°C. 

EXAMPLE 6: The GHPO 1360 polypeptide is useful as a serodiagnostic 
5 tool for H. pylori infection 

The reactivity of patient sera against H. pylori proteins was analyzed by 
immunoblot technique. Briefly, total lysate of H. pylori strain ORV2001 was 
subjected to SDS-PAGE electrophoresis (BioRad protean II system) on a 
12.5% gel. Proteins were electrotransferred onto a nitrocellulose paper for 

10 immunoblot assay. After blocking, the nitrocellulose paper was incubated with 
patient sera ( 1 :500 diluted in blocking buffer) for one hour at room 
temperature, washed, and further incubated with peroxidase-conjugated goat 
anti-human IgG. The positive bands were revealed by incubation with the 
appropriate substrates. The results showed that the //. py/oW-positive ulcer 

1 5 patient sera react specifically with proteins having molecular weights between 
50 and 60 kDa and about 30 to 35 kDa. To identify the nature of these 
proteins, the reactivities of the patient sera were analyzed by immunoblot assay 
against purified proteins with similar molecular weights: urease (67 kDa and 30 
kDa), catalase (54 kDa), heat-shock protein B (60 kDa), and the GHPO 1360 

20 polypeptide (32 kDa) expressed and purified as described in Example 5. All 
patient sera showed strong reactivity against the GHPO 1360 polypeptide, but 
the reactivities against other purified proteins were quite variable. These 
results show that the GHPO 1360 polypeptide is a useful antigen for use in 
diagnosis of H. pylori infection. 
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Other embodiments are within the following claims. 
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(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 2798 base pairs 

(B) TYPE: nucleic acid' 

(C) STRANDEDNESS : single 

( D ) TOPOLOGY : linear 

(ii) MOLECULE TYPE: Genomic DNA 
(ix) FEATURE: 



(A) NAME /KEY : Coding Sequence 

(B) LOCATION: 328... 2451 
(D) OTHER INFORMATION : 



(A) NAME /KEY : Signal Sequence 

(B) LOCATION: 328... 385 
(D) OTHER INFORMATION: 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO : 1 : 

TGGTCCTGGC ATTCCGAGGT TCGAATCCTT GCACCCCAGC CATTTTTCCT TATTTTTTGG 6 0 

CGCGGAGTAG AGCAGTCCGG TAGCTCGTTG GGCTCATAAC CCAAAGGTCA GTGGTTCAAA 12 0 

TCCATTCTCC GCAACCAATC CTTTAAACCA CACCACCACC AAACGAACCA AACGAAACAA 180 

AAAGCATCAA AATCAAAAAA ATGACAAAAT TTTTAAGAAA ATGACAAAAA AAAAAAAAAC 24 0 

GATTTTATGC TATATTAACG AAATCTTGTG ATAAGATCTT ATTCTTTTAA AAGACTTATC 3 00 

TAACCATTTT AATTTCAAGG AGAAAAC ATG AAA AAA ACC CTT TTA CTC TCT CTC 3 54 

Met Lys Lys Thr Leu Leu Leu Ser Leu 
-15 

TCT CTC TCT CTC TCG TTT TTG CTC CAC GCT GAA GAC GAC GGC TTT TAC 4 02 
Ser Leu Ser Leu Ser Phe Leu Leu His Ala Glu Asp Asp Gly Phe Tyr 
-10 -5 15 

ACA AGC GTG GGC TAT CAA ATC GGT GAA GCC GCT CAA ATG GTG AAA AAC 4 50 
Thr Ser Val Gly Tyr Gin lie Gly Glu Ala Ala Gin Met Val Lys Asn 
10 15 20 

ACC AAA GGC ATT CAA GAG CTT TCA GAC AAT TAT GAA AAG CTG AAC AAT 4 98 

Thr Lys Gly lie Gin Glu Leu Ser Asp Asn Tyr Glu Lys Leu Asn Asn 
25 30 35 

CTT TTG AAT AAT TAC AGC ACC CTA AAC ACC CTT ATC AAA TTG TCC GCT 54 6 

Leu Leu Asn Asn Tyr Ser Thr Leu Asn Thr Leu lie Lys Leu Ser Ala 
40 45 50 

GAT CCG AGC GCG ATT AAC GAC GCA AGG GAT AAT CTA GGC TCA AGC TCT 5 94 

Asp Pro Ser Ala lie Asn Asp Ala Arg Asp Asn Leu Gly Ser Ser Ser 
55 60 65 70 

AGG AAT TTG CTT GAT GTC AAA ACC AAT TCC CCC GCG TAT CAA GCC GTG 64 2 

Arg Asn Leu Leu Asp Val Lys Thr Asn Ser Pro Ala Tyr Gin Ala Val 
75 80 85 

CTT TTA GCA CTC AAT GCT GCA GTG GGG TTG TGG CAA GTT ACA AGC TAC 6 90 
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Leu Leu Ala Leu Asn Ala Ala Val Gly Leu Trp Gin Val Thr Ser Tyr 
90 95 100 

GCT TTT ACT GCT TGT GGT CCT GGC AGT AAC GAG AAT GCG AAT GGA GGG 73 8 

Ala Phe Thr Ala Cys Gly Pro Gly Ser Asn Glu Asn Ala Asn Gly Gly 
105 110 115 

ATC CAA ACT TTT AAT AAT GTG CCA GGA CAA GAT ACG ACG ACC ATC ACT 7 86 

lie Gin Thr Phe Asn Asn Val Pro Gly Gin Asp Thr Thr Thr He Thr 
120 125 130 

TGC AAT TCG TAT TAT GAG CCA GGA CAT GGT GGG CCT ATA TCC ACT GCA 8 34 

Cys Asn Ser Tyr Tyr Glu Pro Gly His Gly Gly Pro He Ser Thr Ala 
135 140 145 150 

AAT TAT GCG AAA ATC AAT CAA GCC TAT CAA ATC ATC CAA AAG GCT TTG 8 82 

Asn Tyr Ala Lys He Asn Gin Ala Tyr Gin He He Gin Lys Ala Leu 
155 160 165 

ACA GCC AAT GGA GCT AAT GGA GAT GGG GTC CCC GTT TTA AGC AAC ACC 93 0 

Thr Ala Asn Gly Ala Asn Gly Asp Gly Val Pro Val Leu Ser Asn Thr 
170 175 180 

ACT ACA AAA CTT GAT TTC ACT ATC AAT GGA GAC AAA AGA ACG GGG GGC 97 8 

Thr Thr Lys Leu Asp Phe Thr He Asn Gly Asp Lys Arg Thr Gly Gly 
185 190 195 

AAA CCA AAT ACA CCT GAA AAG TTC CCA TGG AGT GAT GGG AAA TAT ATT 102 6 

Lys Pro Asn Thr Pro Glu Lys Phe Pro Trp Ser Asp Gly Lys Tyr He 
200 205 210 

CAC ACC CAA TGG ATT AAC ACA ATA GTA ACA CCA ACA GAA ACA AAT ATC 1074 
His Thr Gin Trp He Asn Thr He Val Thr Pro Thr Glu Thr Asn He 
215 220 225 230 

AAC ACA GAA AAT AAC GCT CAA GAG CTT TTA AAA CAA GCG AGC ATC ATT 112 2 

Asn Thr Glu Asn Asn Ala Gin Glu Leu Leu Lys Gin Ala Ser He He 
235 240 245 

ATC ACT ACC CTA AAT GAG GCA TGC CCA AAC TTC CAA AAT GGT GGT AGA 1170 
lie Thr Thr Leu Asn Glu Ala Cys Pro Asn Phe Gin Asn Gly Gly Arg 
250 255 260 

AGT TAT TGG CAA GGG ATA AGC GGC AAT GGG ACA ATG TGC GGG ATG TTT 1218 
Ser Tyr Trp Gin Gly He Ser Gly Asn Gly Thr Met Cys Gly Met Phe 
265 270 275 

AAG AAT GAA ATC AGC GCG ATC CAA GGC ATG ATC GCT AAC GCT CAA GAA 12 66 

Lys Asn Glu He Ser Ala He Gin Gly Met He Ala Asn Ala Gin Glu. _ „ 
280 285 290 

GCT GTC GCG CAA AGC AAA ATC GTT AGT GAA AAC GCG CAA AAT CAA AAC 1314 
Ala Val Ala Gin Ser Lys He Val Ser Glu Asn Ala Gin Asn Gin Asn 
295 300 305 310 

AAC TTG GAT ACT GGA AAA CCA TTC AAC CCT TAC ACG GAC GCC AGC TTT 13 62 

Asn Leu Asp Thr Gly Lys Pro Phe Asn Pro Tyr Thr Asp Ala Ser Phe 



SUBSTITUTE SHEET (RULE 26) 

BNSDOCID: <WO 9B43479A1 I > 



WO 98/43479 



-80- 



PCT/US98/06421 



315 320 325 

GCG CAA AGC ATG CTC AAA AAC GCT CAA GCG CAA GCA GAG ATT TTA AAC 1410 
Ala Gin Ser Met Leu Lys Asn Ala Gin Ala Gin Ala Glu lie Leu Asn 
330 335 340 

CAA GCC GAA CAA GTA GTA AAA AAC TTT GAA AAA ATC CCT ACA GCC TTT 14 58 

Gin Ala Glu Gin Val Val Lys Asn Phe Glu Lys lie Pro Thr Ala Phe 
345 350 355 

GTA TCA GAC TCT TTA GGG GTG TGT TAT GAA GTG CAA GGG GGT GAG CGT 15 06 

Val Ser Asp Ser Leu Gly Val Cys Tyr Glu Val Gin Gly Gly Glu Arg 
360 365 370 

AGG GGC ACC AAT CCA GGT CAG GTA ACT TCT AAC ACT TGG GGA GCC GGT 15 54 

Arg Gly Thr Asn Pro Gly Gin Val Thr Ser Asn Thr Trp Gly Ala Gly 
375 380 385 390 

TGC GCG TAT GTG AAA CAA ACC ATA ACG AAT TTA GAC AAC AGC ATC GCT 16 02 

Cys Ala Tyr Val Lys Gin Thr lie Thr Asn Leu Asp Asn Ser lie Ala 
395 400 405 

CAC TTT GGC ACT CAA GAG CAG CAG ATA CAG CAA GCC GAA AAC ATC GCT 165 0 

His Phe Gly Thr Gin Glu Gin Gin lie Gin Gin Ala Glu Asn lie Ala 
410 415 420 

GAC ACT CTA GTG AAT TTC AAA TCT AGA TAC AGC GAA TTA GGC AAC ACC 16 98 

Asp Thr Leu Val Asn Phe Lys Ser Arg Tyr Ser Glu Leu Gly Asn Thr 
425 430 435 

TAT AAC AGC ATC ACC ACC GCG CTC TCC AAA GTC CCT AAC GCG CAA AGC 174 6 

Tyr Asn Ser lie Thr Thr Ala Leu Ser Lys Val Pro Asn Ala Gin Ser 
440 445 450 

TTG CAA AAC GTG GTG AGC AAA AAG AAT AAC CCC TAT AGC CCT CAA GGC 17 94 

Leu Gin Asn Val Val Ser Lys Lys Asn Asn Pro Tyr Ser Pro Gin Gly 
455 460 465 470 

ATA GAG ACC AAT TAC TAC CTC AAT CAA AAT TCT TAC AAC CAA ATC CAA 184 2 

lie Glu Thr Asn Tyr Tyr Leu Asn Gin Asn Ser Tyr Asn Gin lie Gin 
475 480 485 

ACC ATC AAC CAA GAA CTA GGG CGT AAC CCC TTT AGG AAA GTG GGC ATC 18 90 

Thr lie Asn Gin Glu Leu Gly Arg Asn Pro Phe Arg Lys Val Gly lie 
490 495 500 

GTC AAT TCT CAA ACC AAC AAT GGT GCC ATG AAT GGG ATC GGT ATT CAG 193 8 

Val Asn Ser Gin Thr Asn Asn Gly Ala Met Asn Gly lie Gly lie Gin 

505 510 515 . „ 

GTG GGC TAT AAG CAA TTC TTT GGC CAA AAA AGA AAA TGG GGC GCT AGG 1986 
Val Gly Tyr Lys Gin Phe Phe Gly Gin Lys Arg Lys Trp Gly Ala Arg 
520 525 530 

TAT TAC GGC TTT TTT GAC TAC AAC CAT GCG TTC ATT AAA TCC AGC TTC - 2 034 
Tyr Tyr Gly Phe Phe Asp Tyr Asn His Ala Phe lie Lys Ser Ser Phe 
535 540 545 550 
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TTC AAC TCG GCT TCT GAT GTG TGG ACT TAT GGT TTT GGA GCG GAC GCT 2 082 

Phe Asn Ser Ala Ser Asp Val Trp Thr Tyr Gly Phe Gly Ala Asp Ala 
555 560 565 

CTT TAT AAC TTC ATC AAC GAT AAA GCC ACC AAT TTC TTA GGC AAA AAC 213 0 

Leu Tyr Asn Phe lie Asn Asp Lys Ala Thr Asn Phe Leu Gly Lys Asn 
570 575 580 

AAC AAG CTT TCC GTG GGG CTT TTT GGA GGG ATT GCG TTA GCG GGC ACT 217 8 

Asn Lys Leu Ser Val Gly Leu Phe Gly Gly lie Ala Leu Ala Gly Thr 
585 590 595 

TCA TGG CTT AAT TCT GAG TAT GTG AAT TTA GCC ACC GTG AAT AAC GTC 2226 
Ser Trp Leu Asn Ser Glu Tyr Val Asn Leu Ala Thr Val Asn Asn Val 
600 605 610 

TAT AAC GCT AAA ATG AAT GTG GCG AAT TTC CAA TTC TTA TTC AAT ATG 2 2 74 

Tyr Asn Ala Lys Met Asn Val Ala Asn Phe Gin Phe Leu Phe Asn Met 
615 620 625 630 

GGA GTG AGG ATG AAT TTA GCC AGA TCC AAG AAA AAA GGC AGC GAT CAT 23 22 

Gly Val Arg Met Asn Leu Ala Arg Ser Lys Lys Lys Gly Ser Asp His 
635 640 645 

GCG GCT CAG CAT GGG ATT GAA CTA GGG CTT AAA ATC CCC ACC ATC AAC 23 70 

Ala Ala Gin His Gly lie Glu Leu Gly Leu Lys lie Pro Thr lie Asn 
650 655 660 

ACG AAC TAT TAT TCT TTC ATG GGG GCT GAA CTC AAA TAC AGA AGG CTT 2418 
Thr Asn Tyr Tyr Ser Phe Met Gly Ala Glu Leu Lys Tyr Arg Arg Leu 
665 670 675 

TAT AGC GTG TAT TTG AAT TAT GTG TTC GCT TAC TAAGCTTTTT GTGAAACTCC 24 71 
Tyr Ser Val Tyr Leu Asn Tyr Val Phe Ala Tyr 
680 685 

CTTTTTAAGG GGTTTTTTTT TGAACTCTCT TTTTAAATTC TCTTTTTAAA GAGATTTCTT 2 531 

TTTTTTAAGC TT TTTTTTG A ATTCTTTTTT TTGAATTCTT TGTTTTTAAG CTTTTTTTAA 25 91 

ACCCTTTCGT TTTTAAACTC CCTTTTTTAA GGGATTTCTT TTTTTAAACT CTTTTTTTTT 2 651 

AAACTCTTTT TTTTAAACCC TCTTTTTTTA AGGGATTTCT TTTTAAAGCT TTTTTGAAGT 2 711 

CTTTTTTTAA ATTCTTTTTT TGGGGGTTTG ATCTTTCTTT TTGCCAATCC CC ACT AC TTT 2 771 

CGCTTTTTAA TCTTTAGGTT TTATTTT 2 7 98 

(2) INFORMATION FOR SEQ ID NO : 2 : 

(i) SEQUENCE CHARACTERISTICS : 
(A) LENGTH: 708 amino acids 
<B) TYPE: amino acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: protein 
(v) FRAGMENT TYPE: internal 
(ix) FEATURE: 

(A) NAME /KEY : Signal Sequence 

(B) LOCATION: 1 ... 19 
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(D) OTHER INFORMATION: 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO : 2 : 

Met Lys Lys Thr Leu Leu Leu Ser Leu Ser Leu Ser Leu Ser Phe Leu 

-15 -10 "5 

Leu His Ala Glu Asp Asp Gly Phe Tyr Thr Ser Val Gly Tyr Gin lie 

15 10 
Gly Glu Ala Ala Gin Met Val Lys Asn Thr Lys Gly lie Gin Glu Leu 

15 20 25 

Ser Asp Asn Tyr Glu Lys Leu Asn Asn Leu Leu Asn Asn Tyr Ser Thr 
30 35 40 45 

Leu Asn Thr Leu He Lys Leu Ser Ala Asp Pro Ser Ala He Asn Asp 

50 55 60 

Ala Arg Asp Asn Leu Gly Ser Ser Ser Arg Asn Leu Leu Asp Val Lys 

65 70 75 

Thr Asn Ser Pro Ala Tyr Gin Ala Val Leu Leu Ala Leu Asn Ala Ala 

80 85 90 

Val Gly Leu Trp Gin Val Thr Ser Tyr Ala Phe Thr Ala Cys Gly Pro 

95 100 105 

Gly Ser Asn Glu Asn Ala Asn Gly Gly He Gin Thr Phe Asn Asn Val 
110 115 120 125 

Pro Gly Gin Asp Thr Thr Thr He Thr Cys Asn Ser Tyr Tyr Glu Pro 

130 135 140 

Gly His Gly Gly Pro He Ser Thr Ala Asn Tyr Ala Lys He Asn Gin 

145 150 155 

Ala Tyr Gin He He Gin Lys Ala Leu Thr Ala Asn Gly Ala Asn Gly 

160 165 170 

Asp Gly Val Pro Val Leu Ser Asn Thr Thr Thr Lys Leu Asp Phe Thr 

175 180 185 

He Asn Gly Asp Lys Arg Thr Gly Gly Lys Pro Asn Thr Pro Glu Lys 
190 195 200 205 

Phe Pro Trp Ser Asp Gly Lys Tyr He His Thr Gin Trp He Asn Thr 

210 215 220 

He Val Thr Pro Thr Glu Thr Asn He Asn Thr Glu Asn Asn Ala Gin 

225 230 235 

Glu Leu Leu Lys Gin Ala Ser He He He Thr Thr Leu Asn Glu Ala 

240 245 250 

Cys Pro Asn Phe Gin Asn Gly Gly Arg Ser Tyr Trp Gin Gly He Ser 

255 260 265 

Gly Asn Gly Thr Met Cys Gly Met Phe Lys Asn Glu He Ser Ala He 
270 275 280 285 

Gin Gly Met He Ala Asn Ala Gin Glu Ala Val Ala Gin Ser Lys He 

290 295 300 

Val Ser Glu Asn Ala Gin Asn Gin Asn Asn Leu Asp Thr Gly Lys Pro 

305 310 315 

Phe Asn Pro Tyr Thr Asp Ala Ser Phe Ala Gin Ser Met Leu Lys Asn 

320 325 330 

Ala Gin Ala Gin Ala Glu He Leu Asn Gin Ala Glu Gin Val Val Lys 

335 340 , 345 

Asn Phe Glu Lys He Pro Thr Ala Phe Val Ser Asp Ser Leu Gly Val 
350 355 360 365 

Cys Tyr Glu Val Gin Gly Gly Glu Arg Arg Gly Thr Asn Pro Gly Gin 

370 375 380 

Val Thr Ser Asn Thr Trp Gly Ala Gly Cys Ala Tyr Val Lys Gin Thr 
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385 390 395 

lie Thr Asn Leu Asp Asn Ser lie Ala His Phe Gly Thr Gin Glu Gin 

400 405 410 

Gin lie Gin Gin Ala Glu Asn lie Ala Asp Thr Leu Val Asn Phe Lys 

415 420 425 

Ser Arg Tyr Ser Glu Leu Gly Asn Thr Tyr Asn Ser He Thr Thr Ala 
430 435 440 445 

Leu Ser Lys Val Pro Asn Ala Gin Ser Leu Gin Asn Val Val Ser Lys 

450 455 460 

Lys Asn Asn Pro Tyr Ser Pro Gin Gly He Glu Thr Asn Tyr Tyr Leu 

465 470 475 

Asn Gin Asn Ser Tyr Asn Gin He Gin Thr He Asn Gin Glu Leu Gly 

480 485 490 

Arg Asn Pro Phe Arg Lys Val Gly He Val Asn Ser Gin Thr Asn Asn 

495 500 505 

Gly Ala Met Asn Gly He Gly He Gin Val Gly Tyr Lys Gin Phe Phe 
510 515 520 525 

Gly Gin Lys Arg Lys Trp Gly Ala Arg Tyr Tyr Gly Phe Phe Asp Tyr 

530 535 540 

Asn His Ala Phe He Lys Ser Ser Phe Phe Asn Ser Ala Ser Asp Val 

545 550 555 

Trp Thr Tyr Gly Phe Gly Ala Asp Ala Leu Tyr Asn Phe He Asn Asp 

560 565 570 

Lys Ala Thr Asn Phe Leu Gly Lys Asn Asn Lys Leu Ser Val Gly Leu 

575 580 585 

Phe Gly Gly He Ala Leu Ala Gly Thr Ser Trp Leu Asn Ser Glu Tyr 
590 595 600 605 

Val Asn Leu Ala Thr Val Asn Asn Val Tyr Asn Ala Lys Met Asn Val 

610 615 620 

Ala Asn Phe Gin Phe Leu Phe Asn Met Gly Val Arg Met Asn Leu Ala 

625 630 635 

Arg Ser Lys Lys Lys Gly Ser Asp His Ala Ala Gin His Gly He Glu 

640 645 650 

Leu Gly Leu Lys He Pro Thr He Asn Thr Asn Tyr Tyr Ser Phe Met 

655 660 665 

Gly Ala Glu Leu Lys Tyr Arg Arg Leu Tyr Ser Val Tyr Leu Asn Tyr 
670 675 680 685 

Val Phe Ala Tyr 



(2) INFORMATION FOR SEQ ID NO : 3 : 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 2699 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: Genomic DNA 
(ix) FEATURE: 

(A) NAME /KEY : Coding Sequence 

(B) LOCATION: 199... 2397 
(D) OTHER INFORMATION : 
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{ A) NAME /KEY : Signal Sequence 
(B) LOCATION: 199... 259 
(D) OTHER INFORMATION: 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO : 3 : 

TAAAATCCAA TTAAAAGCGT TCAAAGGTAA CGCAAAAAAA CAAAAAATGA CGCAATTTTT 60 

T CAAAATG AC AAAAAAAAAC GCTTTATGCT ATAATACCCC AAATACATTC TAATAGCAAA 12 0 

TGCGTTCTAA TGCAAATGCA TTC CAATGT A TGAAATCCCT AATACTAAAT CCAATTTAAT 18 0 

C CAAAAAGG A GAAAAAAC ATG AAA AAA CAC ATC CTT TCA TTA GCT TTA GGC 231 

Met Lys Lys His lie Leu Ser Leu Ala Leu Gly 
-20 -15 -10 

TCG CTT TTA GTT TCC ACT TTG AGC GCT GAA GAC GAC GGC TTT TAC ACA 27 9 

Ser Leu Leu Val Ser Thr Leu Ser Ala Glu Asp Asp Gly Phe Tyr Thr 
-5 15 

AGC GTA GGC TAT CAG ATC GGT GAA GCC GCT CAA ATG GTA ACA AAC ACC 32 7 

Ser Val Gly Tyr Gin lie Gly Glu Ala Ala Gin Met Val Thr Asn Thr 
10 15 20 

AAA GGC ATC CAA CAG CTT TCA GAC AAT TAT GAA AAT TTG AAC AAC CTT 3 75 

Lys Gly lie Gin Gin Leu Ser Asp Asn Tyr Glu Asn Leu Asn Asn Leu 
25 30 35 

TTA ACG AGA TAC AGC ACC CTA AAC ACC CTT ATC AAA TTG TCC GCT GAT 4 23 

Leu Thr Arg Tyr Ser Thr Leu Asn Thr Leu lie Lys Leu Ser Ala Asp 
40 45 50 55 

CCG AGC GCA ATT AAT GCG GTG CGG GAA AAT CTG GGC GCG AGC GCG AAG 471 
Pro Ser Ala lie Asn Ala Val Arg Glu Asn Leu Gly Ala Ser Ala Lys 
60 65 70 

AAT TTG ATC GGC GAT AAA GCC AAC TCC CCC GCC TAT CAA GCC GTG CTT 519 
Asn Leu lie Gly Asp Lys Ala Asn Ser Pro Ala Tyr Gin Ala Val Leu 
75 80 85 

TTA GCG ATC AAC GCG GCG GTA GGG TTT TGG AAT GTC GTG GGC TAT GTG 567 
Leu Ala lie Asn Ala Ala Val Gly Phe Trp Asn Val Val Gly Tyr Val 
90 95 100 

ACG CAA TGT GGG GGT AAC GCC AAT GGT CAA GAA AGC ACC TCT TCA ACC 615 
Thr Gin Cys Gly Gly Asn Ala Asn Gly Gin Glu Ser Thr Ser Ser Thr 
105 110 115 

ACC ATC TTC AAC AAC GAG CCA GGG TAT CGA TCC ACT TCC ATC ACT TGT 663 
Thr lie Phe Asn Asn Glu Pro Gly Tyr Arg Ser Thr Ser lie Thr Cys 
120 125 130 135 

TCT TTG AAC GGG CAT AAG CCT GGA TAC TAT GGC CCT ATG AGC ATT GAG 711 
Ser Leu Asn Gly His Lys Pro Gly Tyr Tyr Gly Pro Met Ser lie Glu 
140 145 150 

AAT TTT AAA AAG CTT AAC GAA GCC TAT CAG ATC CTC CAA ACG GCT TTA 75 9 

Asn Phe Lys Lys Leu Asn Glu Ala Tyr Gin lie Leu Gin Thr Ala Leu 
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155 160 165 

AAA AAC GGC TTA CCC GCG CTC AAA GAA AAC AAC GGG AAG GTC AGT GTA 8 07 

Lys Asn Gly Leu Pro Ala Leu Lys Glu Asn Asn Gly Lys Val Ser Val 
170 175 180 

ACC TAT ACC TAC ACA TGC TCA GGG CAA GGG AAT AAT AAC TGC TCG CCA 855 

Thr Tyr Thr Tyr Thr Cys Ser Gly Gin Gly Asn Asn Asn Cys Ser Pro 
185 190 195 

AGT GTC AAC GGA ACC AAA ACC ACA ACC CAA ACC ATA GAC GGC AAA AGC 903 

Ser Val Asn Gly Thr Lys Thr Thr Thr Gin Thr lie Asp Gly Lys Ser 
200 205 210 215 

GTA ACC ACC ACG ATC AGT TCA AAA GTG GTT GGT AGC ATC GCT AGT GGC 951 

Val Thr Thr Thr lie Ser Ser Lys Val Val Gly Ser lie Ala Ser Gly 
220 225 230 

AAC ACA TCA CAT GTC ATC ACC AAC AAA TTA GAC GGT GTG CCT GAT AGC 999 

Asn Thr Ser His Val lie Thr Asn Lys Leu Asp Gly Val Pro Asp Ser 
235 240 245 

GCT CAA GCG CTC TTA GCG CAA GCG AGC ACG CTC ATC AAC ACC ATC AAC 104 7 

Ala Gin Ala Leu Leu Ala Gin Ala Ser Thr Leu lie Asn Thr lie Asn 
250 255 260 

GAA GCA TGC CCG TAT TTC CAT GCT ACT AAT AGT AGT GAG GCT AAC GCC 10 95 

Glu Ala Cys Pro Tyr Phe His Ala Thr Asn Ser Ser Glu Ala Asn Ala 
265 270 275 

CCA AAA TTC TCT ACT ACT ACT GGG AAA ATA TGC GGC GCT TTT TCA GAA 114 3 

Pro Lys Phe Ser Thr Thr Thr Gly Lys lie Cys Gly Ala Phe Ser Glu 
280 285 290 295 

GAA ATC AGC GCG ATC CAA AAG ATG ATC ACG GAC GCG CAA GAG CTA GTT 1191 

Glu lie Ser Ala lie Gin Lys Met lie Thr Asp Ala Gin Glu Leu Val 
300 305 310 

AAT CAA ACG AGC GTC ATT AAC AGC AAC GAA CAA TCA ACT CCG GTA GGC 12 3 9 

Asn Gin Thr Ser Val lie Asn Ser Asn Glu Gin Ser Thr Pro Val Gly 
315 320 325 

AAT AAT AAT GGC AAG CCT TTC AAC CCT TTC ACG GAC GCA AGT TTT GCG 128 7 

Asn Asn Asn Gly Lys Pro Phe Asn Pro Phe Thr Asp Ala Ser Phe Ala 
330 335 340 

CAA GGC ATG CTC GCT AAC GCT AGC GCG CAA GCT AAA ATG CTC AAT TTA 133 5 

Gin Gly Met Leu Ala Asn Ala Ser Ala Gin Ala Lys Met Leu Asn Leu 
345 350 355 

GCC CAT CAG GTG GGG CAA GCC ATT AAC CCA GAG AAT CTT AGC GAG AAT 13 83 

Ala His Gin Val Gly Gin Ala lie Asn Pro Glu Asn Leu Ser Glu Asn 
360 365 370 375 

TTT AAA AAT TTT GTT ACA GGC TTT TTA GCC ACA TGC AAT AAC AAA TCA. 14 31 

Phe Lys Asn Phe Val Thr Gly Phe Leu Ala Thr Cys Asn Asn Lys Ser 
380 385 390 
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ACA GCT GGC ACT GGT GGC ACA CAA GGT TCA GCT CCA GGC ACA GTG ACC 14 7 9 

Thr Ala Gly Thr Gly Gly Thr Gin Gly Ser Ala Pro Gly Thr Val Thr 
395 400 405 

ACT CAA ACT TTC GCT TCT GGT TGC GCG TAT GTG GAG CAA ACC CTA ACG 152 7 

Thr Gin Thr Phe Ala Ser Gly Cys Ala Tyr Val Glu.Gln Thr Leu Thr 
410 415 420 

AAC TTA GGC AAC AGC ATC GCT CAC TTT GGC ACT CAA GAG CAG CAG ATA 157 5 

Asn Leu Gly Asn Ser lie Ala His Phe Gly Thr Gin Glu Gin Gin lie 
425 430 435 

CAG CAA GCC GAA AAC ATC GCT GAC ACT CTA GTG AAT TTC AAA TCT AGA 162 3 

Gin Gin Ala Glu Asn lie Ala Asp Thr Leu Val Asn Phe Lys Ser Arg 
440 445 450 455 

TAC AGC GAA TTA GGC AAC ACC TAT AAC AGC ATC ACC ACC GCG CTC TCC 1671 
Tyr Ser Glu Leu Gly Asn Thr Tyr Asn Ser lie Thr Thr Ala Leu Ser 
460 465 470 

AAA GTC CCT AAC GCG CAA AGC TTG CAA AAC GTG GTG AGC AAA AAG AAT 1719 
Lys Val Pro Asn Ala Gin Ser Leu Gin Asn Val Val Ser Lys Lys Asn 
475 480 485 

AAC CCC TAT AGC CCT CAA GGC ATA GAG ACC AAT TAC TAC CTC AAT CAA 17 67 

Asn Pro Tyr Ser Pro Gin Gly lie Glu Thr Asn Tyr Tyr Leu Asn Gin 
490 495 500 

AAT TCT TAC AAC CAA ATC CAA ACC ATC AAC CAA GAA CTA GGG CGT AAC 1815 
Asn Ser Tyr Asn Gin lie Gin Thr lie Asn Gin Glu Leu Gly Arg Asn 
505 510 515 

CCC TTT AGG AAA GTG GGC ATC GTC AAT TCT CAA ACC AAC AAT GGT GCC 186 3 

Pro Phe Arg Lys Val Gly He Val Asn Ser Gin Thr Asn Asn Gly Ala 
520 525 530 . 535 

ATG AAT GGG ATC GGT ATT CAG GTG GGC TAT AAG CAA TTC TTT GGC CAA 1911 
Met Asn Gly He Gly He Gin Val Gly Tyr Lys Gin Phe Phe Gly Gin 
540 545 550 

AAA AGA AAA TGG GGC GCT AGG TAT TAC GGC TTT TTT GAT TAC AAC CAT 195 9 

Lys Arg Lys Trp Gly Ala Arg Tyr Tyr Gly Phe Phe Asp Tyr Asn His 
555 560 565 

GCG TTC ATC AAA TCC AGC TTT TTC AAC TCG GCT TCT GAC GTG TGG ACT 2 0 07 

Ala Phe He Lys Ser Ser Phe Phe Asn Ser Ala Ser Asp Val Trp Thr 
570 575 580 

TAT GGT TTT GGA GCG GAC GCG CTT TAT AAC TTC ATC AAC GAT AAA GCC 2 055 

Tyr Gly Phe Gly Ala Asp Ala Leu Tyr Asn Phe He Asn Asp Lys Ala 
585 590 595 

ACC AAT TTC TTA GGC AAA AAC AAC AAG CTT TCT TTG GGG CTT TTT GGC 2103 
Thr Asn Phe Leu Gly Lys Asn Asn Lys Leu Ser Leu Gly Leu Phe Gly 
600 60S 610 615 

GGG ATT GCG TTA GCG GGC ACT TCA TGG CTC AAT TCT GAG TAC GTG AAT 2151 
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Gly lie Ala Leu Ala Gly Thr Ser Trp Leu Asn Ser Glu Tyr Val Asn 

620 625 630 

TTA GCC ACC GTG AAT AAC GTC TAT AAC GCT AAA ATG AAT GTG GCG AAT 2199 

Leu Ala Thr Val Asn Asn Val Tyr Asn Ala Lys Met Asn Val Ala Asn 
635 640 645 

TTC CAA TTC TTA TTC AAT ATG GGA GTG AGG ATG AAT TTA GCC AGA TCC 2247 

Phe Gin Phe Leu Phe Asn Met Gly Val Arg Met Asn Leu Ala Arg Ser 
650 655 660 

AAG AAA AAA GGC AGC GAT CAT GCA GCT CAG CAT GGG ATT GAG TTA GGG 22 95 

Lys Lys Lys Gly Ser Asp His Ala Ala Gin His Gly lie Glu Leu Gly 
665 670 675 

CTT AAA ATC CCC ACC ATC AAC ACG AAC TAT TAT TCC TTT ATG GGG GCT 2343 

Leu Lys lie Pro Thr lie Asn Thr Asn Tyr Tyr Ser Phe Met Gly Ala 

680 685 690 695 

GAA CTC AAA TAG AGA AGG CTC TAT AGC GTG TAT TTG AAC TAT GTG TTC 23 91 

Glu Leu Lys Tyr Arg Arg Leu Tyr Ser Val Tyr Leu Asn Tyr Val Phe 

700 705 710 

GCT TAC TAATGTTTGG CTCTTTGTGA AACTCCCTTT TTAAGGGGTT TTTTTTTGAA CT 24 4 9 
Ala Tyr 



CTCTTTTTAA ATTCTCTTTT TAAAGAGATT TCTTTTTTTT AAG C TTTTTT TTGAATTCTT 2509 

TTTTTTTGAA TTCTTTGTTT TTAAGCTTTT TTTAAACCCT TTCGTTTTTA AACTCCCTTT 2 56 9 

TTTAAGGGAT TTCTTTTTTT GAACTCCCTT TTTTGAACCC TTTTTTTTAA ACCCTCTTTT 2 62 9 

TTTAAGGGGT TTCTTTTTAA AGCTTTTTTG AAGTCTTTTT TT AAATT CTT TTTTTGGGGG 2689 

TTTGATCTTT 2 699 



(2) INFORMATION FOR SEQ ID NO : 4 : 



(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 73 3 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: protein 
(v) FRAGMENT TYPE: internal 
(ix) FEATURE: 



(A) NAME /KEY : Signal Sequence 

(B) LOCATION: 1 ... 20 
(D) OTHER INFORMATION: 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO : 4 : 

Met Lys Lys His lie Leu Ser Leu Ala Leu Gly Ser Leu Leu Val Ser 
-20 -15 -10 -5_ 

Thr Leu Ser Ala Glu Asp Asp Gly Phe Tyr Thr Ser Val Gly Tyr Gin 
15 10 
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lie Gly Glu Ala Ala Gin Met Val Thr Asn Thr Lys Gly lie Gin Gin 

15 20 25 

Leu Ser Asp Asn Tyr Glu Asn Leu Asn Asn Leu Leu Thr Arg Tyr Ser 

30 35 40 

Thr Leu Asn Thr Leu He Lys Leu Ser Ala Asp Pro Ser Ala He Asn 
45 - - 50 . .55 _ 60 . 

Ala Val Arg Glu Asn Leu Gly Ala Ser Ala Lys Asn Leu He Gly Asp 

65 70 75 

Lys Ala Asn Ser Pro Ala Tyr Gin Ala Val Leu Leu Ala He Asn Ala 

80 85 90 

Ala Val Gly Phe Trp Asn Val Val Gly Tyr Val Thr Gin Cys Gly Gly 

95 100 105 

Asn Ala Asn Gly Gin Glu Ser Thr Ser Ser Thr Thr He Phe Asn Asn 

110 115 120 

Glu Pro Gly Tyr Arg Ser Thr Ser He Thr Cys Ser Leu Asn Gly His 
125 130 135 140 

Lys Pro Gly Tyr Tyr Gly Pro Met Ser He Glu Asn Phe Lys Lys Leu 

145 150 155 

Asn Glu Ala Tyr Gin He Leu Gin Thr Ala Leu Lys Asn Gly Leu Pro 

160 165 170 

Ala Leu Lys Glu Asn Asn Gly Lys Val Ser Val Thr Tyr Thr Tyr Thr 

175 180 185 

Cys Ser Gly Gin Gly Asn Asn Asn Cys Ser Pro Ser Val Asn Gly Thr 

190 195 200 

Lys Thr Thr Thr Gin Thr He Asp Gly Lys Ser Val Thr Thr Thr He 
205 210 215 220 

Ser Ser Lys Val Val Gly Ser He 'Ala Ser Gly Asn Thr Ser His Val 

225 230 235 

He Thr Asn Lys Leu Asp Gly Val Pro Asp Ser Ala Gin Ala Leu Leu 

240 245 250 

Ala Gin Ala Ser Thr Leu He Asn Thr He Asn Glu Ala Cys Pro Tyr 

255 260 265 

Phe His Ala Thr Asn Ser Ser Glu Ala Asn Ala Pro Lys Phe Ser Thr 

270 275 280 

Thr Thr Gly Lys He Cys Gly Ala Phe Ser Glu Glu He Ser Ala He 
285 290 295 300 

Gin Lys Met He Thr Asp Ala Gin Glu Leu Val Asn Gin Thr Ser Val 

305 310 315 

He Asn Ser Asn Glu Gin Ser Thr Pro Val Gly Asn Asn Asn Gly Lys 

320 325 330 

Pro Phe Asn Pro Phe Thr Asp Ala Ser Phe Ala Gin Gly Met Leu Ala 

335 340 345 

Asn Ala Ser Ala Gin Ala Lys Met Leu Asn Leu Ala His Gin Val Gly 

350 355 360 

Gin Ala He Asn Pro Glu Asn Leu Ser Glu Asn Phe Lys Asn Phe Val 
365 370 375 380 

Thr Gly Phe Leu Ala Thr Cys Asn Asn Lys Ser Thr Ala Gly Thr Gly 

385 390 395 

Gly Thr Gin Gly Ser Ala Pro Gly Thr Val Thr Thr Gin Thr Phe Ala 

400 405 410 

Ser Gly Cys Ala Tyr Val Glu Gin Thr Leu Thr Asn Leu Gly Asn Ser 

415 420 425 

He Ala His Phe Gly Thr Gin Glu Gin Gin He Gin Gin Ala Glu Asn 

430 435 440 

He Ala Asp Thr Leu Val Asn Phe Lys Ser Arg Tyr Ser Glu Leu Gly 
445 450 455 460 

Asn Thr Tyr Asn Ser He Thr Thr Ala Leu Ser Lys Val Pro Asn Ala 
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465 470 475 

Gin Ser Leu Gin Asn Val Val Ser Lys Lys Asn Asn Pro Tyr Ser Pro 

480 485 490 

Gin Gly lie Glu Thr Asn Tyr Tyr Leu Asn Gin Asn Ser Tyr Asn Gin 

495 500 505 

lie Gin Thr lie Asn Gin Glu Leu Gly Arg Asn Pro Phe Arg Lys Val 

510 515 520 

Gly He Val Asn Ser Gin Thr Asn Asn Gly Ala Met Asn Gly He Gly 
525 530 535 540 

He Gin Val Gly Tyr Lys Gin Phe Phe Gly Gin Lys Arg Lys Trp Gly 

545 550 555 

Ala Arg Tyr Tyr Gly Phe Phe Asp Tyr Asn His Ala Phe He Lys Ser 

560 565 570 

Ser Phe Phe Asn Ser Ala Ser Asp Val Trp Thr Tyr Gly Phe Gly Ala 

575 580 585 

Asp Ala Leu Tyr Asn Phe He Asn Asp Lys Ala Thr Asn Phe Leu Gly 

590 595 600 

Lys Asn Asn Lys Leu Ser Leu Gly Leu Phe Gly Gly He Ala Leu Ala 
605 610 615 620 

Gly Thr Ser Trp Leu Asn Ser Glu Tyr Val Asn Leu Ala Thr Val Asn 

625 630 635 

Asn Val Tyr Asn Ala Lys Met Asn Val Ala Asn Phe Gin Phe Leu Phe 

640 645 650 

Asn Met Gly Val Arg Met Asn Leu Ala Arg Ser Lys Lys Lys Gly Ser 

655 660 665 

Asp His Ala Ala Gin His Gly He Glu Leu Gly Leu Lys He Pro Thr 

670 675 680 

He Asn Thr Asn Tyr Tyr Ser Phe Met Gly Ala Glu Leu Lys Tyr Arg 
685 690 695 700 

Arg Leu Tyr Ser Val Tyr Leu Asn Tyr Val Phe Ala Tyr 
705 710 

(2) INFORMATION FOR SEQ ID NO : 5 : 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 2915 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

<ii) MOLECULE TYPE: Genomic DNA 
(ix) FEATURE: 

(A) NAME /KEY : Coding Sequence 

(B) LOCATION: 365... 2597 
(D) OTHER INFORMATION: 



(A) NAME/ KEY : Signal Sequence 

(B) LOCATION: 365... 425 
(D) OTHER INFORMATION: 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO : 5 : 



SUBSTITUTE SHEET (RULE 26) 

BNSDOCID: <WO 9843479A1_I_> 



WO 98/43479 90 PCT/US98/06421 

TTTTAGGCGA CAAAATCGCT TATGTTGGGG ATAAAGGCAA CCCGCACAAT TTCGCTCACA 6 0 

AGAAATAAAC CGCTCATAAG GGGCAAACGC CCCAAAAAAG CGATTTTTAA AG AG GTTACG 12 0 

GCAAAATCAA GCTCTTTAGT ATTTAATCTT AAAAAATGCT AAAAGCCTTT TTATGGGCTA 18 0 

ACACCACACA AAAAGCATCA AAATCAAAAA AATGACAAAA TTTTTAAGAA AATGACAAAA 24 0 

AAAAACGCTT TATG CTATAA TACCCCAAAT ACATTCTAAT AGCAAATGCG TTCTAATGCA 3 00 

AATGCATTCC AATGTATGAA ATCCCTAATA CTAAATCCAA TTTAATCCAA AAAGGAGAAA 3 60 

AAAC ATG AAA AAA CAC ATC CTT TCA TTA GCT TTA GGC TCG CTT TTA GTT 40 9 

Met Lys Lys His lie Leu Ser Leu Ala Leu Gly Ser Leu Leu Val 

-20 -15 -10 

TCC ACT TTG AGC GCT GAA GAC GAC GGC TTT TAC ACA AGC GTA GGC TAT 4 57 

Ser Thr Leu Ser Ala Glu Asp Asp Gly Phe Tyr Thr Ser Val Gly Tyr 
-5 15 10 

CAG ATC GGT GAA GCC GCT CAA ATG GTA ACA AAC ACC AAA GGC ATC CAA 50 5 

Gin lie Gly Glu Ala Ala Gin Met Val Thr Asn Thr Lys Gly lie Gin 
15 20 25 

CAG CTT TCA GAC AAT TAT GAA AAT TTG AAC AAC CTT TTA ACG AGA TAC 55 3 

Gin Leu Ser Asp Asn Tyr Glu Asn Leu Asn Asn Leu Leu Thr Arg Tyr 
30 35 40 

AGC ACC CTA AAC ACC CTT ATC AAA TTG TCC GCT GAT CCG AGC GCA ATT 601 
Ser Thr Leu Asn Thr Leu lie Lys Leu Ser Ala Asp Pro Ser Ala lie 
45 50 55 

AAT GCG GTG CGG GAA AAT CTG GGC GCG AGC ACG AAG AAT TTG ATC GGC 64 9 

Asn Ala Val Arg Glu Asn Leu Gly Ala Ser Thr Lys Asn Leu lie Gly 
60 65 70 75 

GAT AAA GCC AAC TCC CCG GCG TAT CAA GCC GTG TTT TTA GCG ATC AAC 6 97 

Asp Lys Ala Asn Ser Pro Ala Tyr Gin Ala Val Phe Leu Ala lie Asn 
80 85 90 

GCG GCG GTA GGG TTG TGG AAT ACC ATC GGC TAT GCG GTC ATG TGC GGG 74 5 

Ala Ala Val Gly Leu Trp Asn Thr lie Gly Tyr Ala Val Met Cys Gly 
95 100 105 

AAC GGG AAC GGC ACA GAG AGT GGG CCT GGC AGC GTG ATC TTT AAT GAC 7 93 

Asn Gly Asn Gly Thr Glu Ser Gly Pro Gly Ser Val lie Phe Asn Asp 
110 115 120 

CAA CCA GGA CAG GAT TCC ACG CAA ATT ACT TGC AAC CGC TTT GAA TCA 841 
Gin Pro Gly Gin Asp Ser Thr Gin lie Thr Cys Asn Arg Phe Glu Ser 
125 130 135 

ACT GGG CCT GGT AAA AGC ATG TCT ATT GAT GAA TTC AAA AAA CTC AAT 8 89 

Thr Gly Pro Gly Lys Ser Met Ser lie Asp Glu Phe Lys Lys Leu Asn 
140 145 150 155 

GAA GCC TAT CAA ATC ATC CAG CAA GCT TTA AAA AAT CAA AGT GGG TTT 9S7 
Glu Ala Tyr Gin lie lie Gin Gin Ala Leu Lys Asn Gin Ser Gly Phe 
160 165 170 

CCT GAA TTA GGC GGG AAC GGC ACA AAA GTG AGT GTT AAT TAC AAT TAC 985 
Pro Glu Leu Gly Gly Asn Gly Thr Lys Val Ser Val Asn Tyr Asn Tyr 
175 180 185 
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GAA TGC AGA CAA ACT GCT GAT ATC AAC GGC GGT GTG TAT CAG TTC TGC 103 3 

Glu Cys Arg Gin Thr Ala Asp lie Asn Gly Gly Val Tyr Gin Phe Cys 
190 195 200 



AAG GCT AAA AAT GGT AGT AGT AGC AGT AGT AAT GGC GGT AAT GGC AGT 1081 

Lys Ala Lys Asn Gly Ser Ser Ser Ser Ser Asn Gly Gly Asn Gly Ser 
205 210 215 

AGC ACG CAA ACA ACC GCG ACA ACC ACG CAA GAC GGC GTA ACG ATC ACC 112 9 

Ser Thr Gin Thr Thr Ala Thr Thr Thr Gin Asp Gly Val Thr lie Thr 

220 225 230 235 

ACT ACC TAT AAT AAT AAC AAA GCC ACC GTC AAA TTT GAC ATC ACC AAT 1177 

Thr Thr Tyr Asn Asn Asn Lys Ala Thr Val Lys Phe Asp lie Thr Asn 
240 245 250 



AAC GCT GAA CAG CTG TTA AAT CAA GCG GCA AAC ATC ATG CAA GTC CTT 122 5 

Asn Ala Glu Gin Leu Leu Asn Gin Ala Ala Asn lie Met Gin Val Leu 
255 260 265 



AAT ACG CAA 
Asn Thr Gin 
270 

GGG GGT GGT 
Gly Gly Gly 
285 

ATC TTC CAA 
lie Phe Gin 
300 

CAA GAA ATA 
Gin Glu lie 



CAA AAC AAC 
Gin Asn Asn 



AGC TTT GCG 
Ser Phe Ala 
350 

TTC AAT TTG 
Phe Asn Leu 
365 

AAC AAT AAT 
Asn Asn Asn 
380 

ATG ACC AAT 
Met Thr Asn 



ACA TTG CCT 



TGC CCT TTA 
Cys Pro Leu 



CAA CCA TGG 
Gin Pro Trp 



CAA GAA TTT 
Gin Glu Phe 
305 

ATC GCG CAA 
lie Ala Gin 
320 

TTG GAT ACT 
Leu Asp Thr 
335 

CAA AGC ATG 
Gin Ser Met 



AGC GAA CAA 
Ser Glu Gin 



GTT AAC GAG 
Val Asn Glu 
385 

TTT GTT AGC 
Phe Val Ser 
400 

AAT GCA GGG 



GTG CGT TCC 
Val Arg Ser 
275 

GGT TTA AGC 
Gly Leu Ser 
290 

AGC CAG GTT 
Ser Gin Val 



AGC AAA ATC 
Ser Lys lie 



GGA AAA CCA 
Gly Lys Pro 
340 

CTC AAA AAC 
Leu Lys Asn 
355 

GTG AAA AAG 
Val Lys Lys 
370 

AAA TTA GCA 
Lys Leu Ala 

GCC TTT TTG 
Ala Phe Leu 



GTT ACT TCT 



ACG AAT AAC 
Thr Asn Asn 



ACA TCC GGG 
Thr Ser Gly 
295 

ACT AGC ATG 
Thr Ser Met 
310 

GTT AGT GAA 
Val Ser Glu 
325 

TTC AAC CCT 
Phe Asn Pro 



GCT CAA GCG 
Ala Gin Ala 



AAC TTG GAA 
Asn Leu Glu 
375 

GGA TTT GGG 
Gly Phe Gly 
390 

GCA AGC TGC 
Ala Ser Cys 
405 

AAC ACT TGG 



GAA AAC ACT 
Glu Asn Thr 
280 

AAT GCG TGC 
Asn Ala Cys 



ATC AAA AAC 
lie Lys Asn 



AAC GCG CAA 
Asn Ala Gin 
330 

TAC ACG GAC 
Tyr Thr Asp 
345 

CAA GCA GAG 
Gin Ala Glu 
360 

GTC ATG AAA 
Val Met Lys 



AAA GAA GAA 
Lys Glu Glu 



AAA GAT GGT 
Lys Asp Gly 
410 

GGG GCG GGT 



CCA 1273 
Pro 



AGC 13 21 

Ser 



GCC 1369 

Ala 

315 

AAT 1417 
Asn 



GCC 1465 
Ala 



ATG 1513 
Met 



AAC 1561 
Asn 



GTA 1609 

Val" - 

395 

GGC 1657 
Gly 



TGC 1705 
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Thr Leu Pro Asn Ala Gly Val Thr Ser Asn Thr Trp Gly Ala Gly Cys 
415 420 425 

GCG TAT GTG GGA GAG ACG ATA AGC GCC CTA ACC AAC AGC ATC GCT CAC 175 3 

Ala Tyr Val Gly Glu Thr lie Ser Ala Leu Thr Asn Ser lie Ala His 
430 435 440 

TTT GGC ACT CAA GAG CAG CAG ATA CAG CAA GCC GAA AAC ATC GCT GAC 18 01 

Phe Gly Thr Gin Glu Gin Gin lie Gin Gin Ala Glu Asn He Ala Asp 
445 450 455 

ACT CTA GTG AAT TTC AAA TCT AGA TAC AGC GAA TTA GGC AAC ACC TAT 184 9 

Thr Leu Val Asn Phe Lys Ser Arg Tyr Ser Glu Leu Gly Asn Thr Tyr 
460 465 470 475 

AAC AGC ATC ACC ACC GCG CTC TCC AAA GTC CCT AAC GCG CAA AGC TTG 18 97 

Asn Ser He Thr Thr Ala Leu Ser Lys Val Pro Asn Ala Gin Ser Leu 
480 485 490 

CAA AAC GTG GTG AGC AAA AAG AAT AAC CCC TAT AGC CCT CAA GGC ATA 194 5 

Gin Asn Val Val Ser Lys Lys Asn Asn Pro Tyr Ser Pro Gin Gly He 
495 500 505 

GAG ACC AAT TAC TAC CTC AAT CAA AAT TCT TAC AAC CAA ATC CAA ACC 19 93 

Glu Thr Asn Tyr Tyr Leu Asn Gin Asn Ser Tyr Asn Gin He Gin Thr 
510 515 520 

ATC AAC CAA GAA CTA GGG CGT AAC CCC TTT AGG AAA GTG GGC ATC GTC 2 041 

He Asn Gin Glu Leu Gly Arg Asn Pro Phe Arg Lys Val Gly He Val 
525 530 535 

AAT TCT CAA ACC AAC AAT GGT GCC ATG AAT GGG ATC GGC ATT CAG GTG 2089 
Asn Ser Gin Thr Asn Asn Gly Ala Met Asn Gly He Gly He Gin Val 
540 545 550 555 

GGC TAT AAG CAA TTC TTT GGC CAA AAA AGA AAA TGG GGC GCT AGG TAT 213 7 

Gly Tyr Lys Gin Phe Phe Gly Gin Lys Arg Lys Trp Gly Ala Arg Tyr 
560 565 570 

TAC GGC TTT TTT GAT TAC AAC CAT GCG TTC ATC AAA TCC AGC TTT TTC 218 5 

Tyr Gly Phe Phe Asp Tyr Asn His Ala Phe He Lys Ser Ser Phe Phe 
575 580 585 

AAC TCG GCT TCT GAC GTG TGG ACT TAT GGT TTT GGA GCG GAC GCG CTT 22 3 3 

Asn Ser Ala Ser Asp Val Trp Thr Tyr Gly Phe Gly Ala Asp Ala Leu 
590 595 600 

TAT AAC TTC ATC AAC GAT AAA GCC ACC AAT TTC TTA GGC AAA AAC AAC 22 81 

Tyr Asn Phe He Asn Asp Lys Ala Thr Asn Phe Leu Gly Lys Asn Asn 
605 610 615 

AAG CTT TCT TTG GGG CTT TTT GGC GGG ATT GCG TTA GCG GGC ACT TCA 232 9 

Lys Leu Ser Leu Gly Leu Phe Gly Gly He Ala Leu Ala Gly Thr Ser 
620 625 630 635 

TGG CTC AAT TCT GAG TAC GTG AAT TTA GCC ACC GTG AAT AAC GTC TAT 2 377 

Trp Leu Asn Ser Glu Tyr Val Asn Leu Ala Thr Val Asn Asn Val Tyr 
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640 645 650 

AAC GCT AAA ATG AAT GTG GCG AAT TTC CAA TTC TTA TTC AAT ATG GGA 2425 
Asn Ala Lys Met Asn Val Ala Asn Phe Gin Phe Leu Phe Asn Met Gly 
655 660 665 

GTG AGG ATG AAT TTA GCC AGA TCC AAG AAA AAA GGC AGC GAT CAT GCA 24 73 

Val Arg Met Asn Leu Ala Arg Ser Lys Lys Lys Gly Ser Asp His Ala 
670 675 680 

GCT CAG CAT GGG ATT GAG TTA GGG CTT AAA ATC CCC ACC ATC AAC ACG 2 521 

Ala Gin His Gly lie Glu Leu Gly Leu Lys lie Pro Thr lie Asn Thr 
685 690 695 

AAC TAT TAT TCC TTT ATG GGG GCT GAA CTC AAA TAC AGA AGG CTC TAT 2 569 

Asn Tyr Tyr Ser Phe Met Gly Ala Glu Leu Lys Tyr Arg Arg Leu Tyr 
700 705 710 715 

AGC GTG TAT TTG AAT NAT GTG TTC GCT TAC TAAGCTTTTT GTGAAACTCC 2 619 

Ser Val Tyr Leu Asn Xaa Val Phe Ala Tyr 
720 725 

CTTTTTAAGG GGTTTTTTTT TGAACTCTCT TTTAAATTCT CTTTTTAAAG AGATTTCTTT 2 67 9 

TTTTAAGCTT TTTTTTGAAC TTTTTTTTGA ATTCTTTGTT TTTAAGCTTT TTTTAAACCC 273 9 

TTTCGTTTTT AAACTCCCTT TTTTAAGGGA TTTCTTTTTT TGAACTCCCT TTTTTGAACC 27 99 

CTTTTTTTTA AACCCTCTTT TTTTAAGGGG TTTCTTTTTA AAGCTTTTTT GAAGTCTTTT 2 85 9 

TTTAAATTCT TTTTTTGGGG GTTTGATCTT TCTTTTTGCC AATCCCCACT ACTTTC 2 915 



(2) INFORMATION FOR SEQ ID NO : 6 : 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 74 5 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: protein 
(v) FRAGMENT TYPE: internal 
(ix) FEATURE: 

(A) NAME /KEY : Signal Sequence 

(B) LOCATION : 1 ... 20 
(D) OTHER INFORMATION: 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO : 6 : 

Met Lys Lys His lie Leu Ser Leu Ala Leu Gly Ser Leu Leu Val Ser 
-20 -15 -10 -5 

Thr Leu Ser Ala Glu Asp Asp Gly Phe Tyr Thr Ser Val Gly Tyr Gin 

15 10 
lie Gly Glu Ala Ala Gin Met Val Thr Asn Thr Lys Gly lie Gin Gin 

15 20 25 

Leu Ser Asp Asn Tyr Glu Asn Leu Asn Asn Leu Leu Thr Arg Tyr Ser 
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30 35 40 

Thr Leu Asn Thr Leu He Lys Leu Ser Ala Asp Pro Ser Ala He Asn 
45 50 55 60 

Ala Val Arg Glu Asn Leu Gly Ala Ser Thr Lys Asn Leu He Gly Asp 

65 70 75 

Lys Ala Asn Ser Pro Ala Tyr Gin Ala Val Phe Leu Ala He Asn Ala 

80 85 90 

Ala Val Gly Leu Trp Asn Thr He Gly Tyr Ala Val Met Cys Gly Asn 

95 100 105 

Gly Asn Gly Thr Glu Ser Gly Pro Gly Ser Val lie Phe Asn Asp Gin 

HO 115 120 

Pro Gly Gin Asp Ser Thr Gin He Thr Cys Asn Arg Phe Glu Ser Thr 
125 130 135 140 

Gly Pro Gly Lys Ser Met Ser He Asp Glu Phe Lys Lys Leu Asn Glu 

145 150 155 

Ala Tyr Gin He He Gin Gin Ala Leu Lys Asn Gin Ser Gly Phe Pro 

160 165 I 70 

Glu Leu Gly Gly Asn Gly Thr Lys Val Ser Val Asn Tyr Asn Tyr Glu 

17 5 180 185 

Cys Arg Gin Thr Ala Asp He Asn Gly Gly Val Tyr Gin Phe Cys Lys 

190 195 200 

Ala Lys Asn Gly Ser Ser Ser Ser Ser Asn Gly Gly Asn Gly Ser Ser 
205 210 215 220 

Thr Gin Thr Thr Ala Thr Thr Thr Gin Asp Gly Val Thr He Thr Thr 

225 230 235 

Thr Tyr Asn Asn Asn Lys Ala Thr Val Lys Phe Asp He Thr Asn Asn 

240 245 250 

Ala Glu Gin Leu Leu Asn Gin Ala Ala Asn He Met Gin Val Leu Asn 

255 260 265 

Thr Gin Cys Pro Leu Val Arg Ser Thr Asn Asn Glu Asn Thr Pro Gly 

270 275 280 

Glv Gly Gin Pro Trp Gly Leu Ser Thr Ser Gly Asn Ala Cys Ser He 

290 295 300 

Phe Gin Gin Glu Phe Ser Gin Val Thr Ser Met He Lys Asn Ala Gin 

305 310 315 

Glu He He Ala Gin Ser Lys He Val Ser Glu Asn Ala Gin Asn Gin 

320 325 330 

Asn Asn Leu Asp Thr Gly Lys Pro Phe Asn Pro Tyr Thr Asp Ala Ser 

335 340 345 

Phe Ala Gin Ser Met Leu Lys Asn Ala Gin Ala Gin Ala Glu Met Phe 

350 355 360 

Asn Leu Ser Glu Gin Val Lys Lys Asn Leu Glu Val Met Lys Asn Asn 
365 370 375 380 

Asn Asn Val Asn Glu Lys Leu Ala Gly Phe Gly Lys Glu Glu Val Met 

385 390 395 

Thr Asn Phe Val Ser Ala Phe Leu Ala Ser Cys Lys Asp Gly Gly Thr 

400 405 410 

Leu Pro Asn Ala Gly Val Thr Ser Asn Thr Trp Gly Ala Gly Cys Ala 

415 420 425 

Tyr Val Gly Glu Thr He Ser Ala Leu Thr Asn Ser He Ala His Phe ■ 

430 435 440 

Gly Thr Gin Glu Gin Gin He Gin Gin Ala Glu Asn He Ala Asp Thr 
445 450 455 460 

Leu Val Asn Phe Lys Ser Arg Tyr Ser Glu Leu Gly Asn Thr Tyr Asn 

465 470 475 

Ser He Thr Thr Ala Leu Ser Lys Val Pro Asn Ala Gin Ser Leu Gin 
480 485 490 



SUBSTITUTE SHEET (RULE 26) 



BNSDOCID: <WO 9B43479A1J_> 



WO 98/43479 95 PCT/US98/0642 1 

Asn Val Val Ser Lys Lys Asn Asn Pro Tyr Ser Pro Gin Gly lie Glu 

495 500 505 

Thr Asn Tyr Tyr Leu Asn Gin Asn Ser Tyr Asn Gin lie Gin Thr lie 

510 515 520 

Asn Gin Glu Leu Gly Arg Asn Pro Phe Arg Lys Val Gly He Val Asn 
525 530 535 540 

Ser Gin Thr Asn Asn Gly Ala Met Asn Gly He Gly He Gin Val Gly 

545 550 555 

Tyr Lys Gin Phe Phe Gly Gin Lys Arg Lys Trp Gly Ala Arg Tyr Tyr 

560 565 570 

Gly Phe Phe Asp Tyr Asn His Ala Phe He Lys Ser Ser Phe Phe Asn 

575 580 585 

Ser Ala Ser Asp Val Trp Thr Tyr Gly Phe Gly Ala Asp Ala Leu Tyr 

590 595 600 

Asn Phe He Asn Asp Lys Ala Thr Asn Phe Leu Gly Lys Asn Asn Lys 
605 610 615 620 

Leu Ser Leu Gly Leu Phe Gly Gly He Ala Leu Ala Gly Thr Ser Trp 

625 630 635 

Leu Asn Ser Glu Tyr Val Asn Leu Ala Thr Val Asn Asn Val Tyr Asn 

64 G 64 5 650 

Ala Lys Met Asn Val. Ala Asn Phe Gin Phe Leu Phe Asn Met Gly Val 

655 660 665 

Arg Met Asn Leu Ala Arg Ser Lys Lys Lys Gly Ser Asp His Ala Ala 

670 675 680 

Gin His Gly He Glu Leu Gly Leu Lys He Pro Thr He Asn Thr Asn 
685 690 695 700 

Tyr Tyr Ser Phe Met Gly Ala Glu Leu Lys Tyr Arg Arg Leu Tyr Ser 

705 710 715 

Val Tyr Leu Asn Xaa Val Phe Ala Tyr 
720 725 

(2) INFORMATION FOR SEQ ID NO : 7 : 

<i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 2603 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: Genomic DNA 
(ix) FEATURE: 

(A) NAME /KEY : Coding Sequence 

(B) LOCATION: 210... 2342 
(D) OTHER INFORMATION: 



(A) NAME/ KEY : Signal Sequence 

(B) LOCATION: 210... 270 
(D) OTHER INFORMATION: 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO : 7 : 
ATGACCTTTA TTGGTTTAAT ATTTGTTTAG AAATAACACA AAAACCTTTT TTTTTTTTTT 6 0 
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TGAAAGGGCA AAAACGCCTA ATTAATATCA AAATCCCATG AATTT AT AC T ATATTAACGA 12 0 

AAG CTTGCGG TATGGTTTCA CCTAAAGACA CACTTCCGCA AGATTTACTA ACAATTTCAA 18 0 

TCTTATTTCA AGTAATAAAA GGAGAAAAC ATG AAG AAA AAA TTT CTG TCA TTA 233 

Met Lys Lys Lys Phe Leu Ser Leu 
-20 "15 



ACC TTA GGT TCG CTT TTA GTT TCC GCT TTA AGC GCT GAA GAC AAC GGC 2 81 

Thr Leu Gly Ser Leu Leu Val Ser Ala Leu Ser Ala Glu Asp Asn Gly 
-10 -5 1 

TTT TTT GTG AGT GCG GGC TAT CAA ATC GGT GAA TCC GCT CAA ATG GTG 32 9 

Phe Phe Val Ser Ala Gly Tyr Gin lie Gly Glu Ser Ala Gin Met Val 
5 10 15 20 

AAA AAC ACT AAA GGC ATT CAA GAT CTT TCA GAT AGC TAT GAA AGA CTG 3 77 

Lys Asn Thr Lys Gly He Gin Asp Leu Ser Asp Ser Tyr Glu Arg Leu 
25 30 35 

AAC AAT CTT TTA ACG AGT TAT AGT GCC CTA AAC ACT CTT ATT AGG CAG 4 25 

Asn Asn Leu Leu Thr Ser Tyr Ser Ala Leu Asn Thr Leu He Arg Gin 
40 45 50 

TCC GCC GAC CCC AAC GCT ATC AAT AAC GCA AGG GGC AAT TTG AAC GCT 4 73 

Ser Ala Asp Pro Asn Ala He Asn Asn Ala Arg Gly Asn Leu Asn Ala 
55 60 65 

AGT GCG AAG AAT TTG ATC AAT GAT AAA AAG AAT TCC CCG GCG TAT CAA 521 
Ser Ala Lys Asn Leu He Asn Asp Lys Lys Asn Ser Pro Ala Tyr Gin 
70 75 80 

GCG GTG CTT TTA GCC TTG AAT GCG GCA GCG GGG TTG TGG CAA GTC ATG 56 9 

Ala Val Leu Leu Ala Leu Asn Ala Ala Ala Gly Leu Trp Gin Val Met 
85 90 95 100 

AGC TAT TCG ATC AGC GTT TGT GGC CCT GGC TCT GAC AAA AAT AAA AAT 617 
Ser Tyr Ser He Ser Val Cys Gly Pro Gly Ser Asp Lys Asn Lys Asn 
105 HO H5 

GGG GGC GTC CAA ACC TTT GAA AAT GTG CCG TCA AAT GGG GGG ACT ACC 6 65 

Gly Gly Val Gin Thr Phe Glu Asn Val Pro Ser Asn Gly Gly Thr Thr 
120 125 130 

ATT GCT TGC GAT TCA TTT TAT GAA CCA GGA AAG TGG AGC GGT ATA TCC 713 
He Ala Cys Asp Ser Phe Tyr Glu Pro Gly Lys Trp Ser Gly He Ser 
135 140 145 

ACT GAA AAT TAC GCA AAA ATC AAT AAA GCC TAT CAA ATC ATC CAA AAG 761 
Thr Glu Asn Tyr Ala Lys He Asn Lys Ala Tyr Gin He He Gin Lys 
150 155 160 



GCT TTT GGA GCA AGC GGG CAA GAT ATT CCT GCC TTA AGC GAC ACC AAA 
Ala Phe Gly Ala Ser Gly Gin Asp He Pro Ala Leu Ser Asp Thr Lys 
165 170 175 180 



809 



GAA CTT AAT TTT GAA ATT AAA GGG AAA AAA AAT GAT AGC GTC CAG CCA 8 57 

Glu Leu Asn Phe Glu He Lys Gly Lys Lys Asn Asp Ser Val Gin Pro 
185 190 195 
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GGA GAA AGA 
Gly Glu Arg 



AAG TGG GTG 
Lys Trp Val 
215 

TCA AAT AAC 
Ser Asn Asn 
230 

ACT CTT AAT 
Thr Leu Asn 
245 

GTG GCC GGT 
Val Ala Gly 



AGC GCA TGC 
Ser Ala Cys 



ATC AAA AAC 
lie Lys Asn 
295 

AAC GCG CAA 
Asn Ala Gin 
310 

TAT AAA GAC 
Tyr Lys Asp 
325 

CAA GCG GAG 
Gin Ala Glu 



AGA ATC CCT 
Arg lie Pro 



AAG GGT AGC 
Lys Gly Ser 
375 

TCT AAC ACT 
Ser Asn Thr 
390 

AAT CTA AAA 
Asn Leu Lys 
405 

CAT AAT GCG 



TGG AAA TTC 
Trp Lys Phe 
200 

AAT GGG AAG 
Asn Gly Lys 



GCT CAA GAG 
Ala Gin Glu 



GAA GCA TGC 
Glu Ala Cys 
250 

GGC AAT AGT 
Gly Asn Ser 
265 

GGG ATT TTT 
Gly lie Phe 
280 

GCT GAA ATA 
Ala Glu He 



AAC CAG CAC 
Asn Gin His 



GCC AAC TTC 
Ala Asn Phe 
330 

ATT TTA AAC 
He Leu Asn 
345 

GCA GCG TTC 
Ala Ala Phe 
360 

GAC GGC AAT 
Asp Gly Asn 



TGG GGA GCC 
Trp Gly Ala 



AAC AGC ATC 
Asn Ser He 
410 

CGA AAT CTC 



CCA TGG ACT 
Pro Trp Thr 
205 

TAT GAA GAA 
Tyr Glu Glu 
220 

CTT TTA AAA 
Leu Leu Lys 
235 

CCA TGG TTG 
Pro Trp Leu 



TTA TGG GCC 
Leu Trp Ala 



AAA AAT GAA 
Lys Asn Glu 
285 

GCC GTA GAG 
Ala Val Glu 
300 

AAC CTA GAC 
Asn Leu Asp 
315 

GCC CAA AGC 
Ala Gin Ser 



CGC GCT CAA 
Arg Ala Gin 



GTG AAA GAC 
Val Lys Asp 
365 

CTC CGT GGC 
Leu Arg Gly 
380 

GGC TGC GCG 
Gly Cys Ala 
395 

GCT CAT TTT 
Ala His Phe 



GCC TAC ACT 



AAT GGA AAA 
Asn Gly Lys 



ATT AAA GAA 
He Lys Glu 



CAG GCT AGC 
Gin Ala Ser 
240 

AGT AAT GGT 
Ser Asn Gly 
255 

GGA ATA GAT 
Gly He Asp 
270 

ATC AGC GCG 
He Ser Ala 



CAA TCC AAA 
Gin Ser Lys 



ACT GGG AAA 
Thr Gly Lys 
320 

ATG TTC GCT 
Met Phe Ala 
335 

GCA GTG GTG 
Ala Val Val 
350 

TCT TTA GGA 
Ser Leu Gly 



ACG CCA TCT 
Thr Pro Ser 



TAT GTG GGA 
Tyr Val Gly 
400 

GGC GAC CAA 
Gly Asp Gin 
415 

TTA GCG AAT 



TTT GTT TCA 
Phe Val Ser 
210 

GAC ATC AAA 
Asp He Lys 
225 

ACT ATT TTA 
Thr He Leu 



GGT GCA GGC 
Gly Ala Gly 



AAA GGC GAC 
Lys Gly Asp 
275 

ATT CAA GAC 
lie Gin Asp 
290 

ATC GTT ACC 
He Val Thr 
305 

GCA TTC AAC 
Ala Phe Asn 



AAC GCT AGA 
Asn Ala Arg 

AAG GAC TTT 
Lys Asp Phe 
355 

GTA TGC CAT 
Val Cys His 
370 

GGC ACG GTT 
Gly Thr Val 
385 

GAA ACC GTA 
Glu Thr Val 



GCG GAG CGA 
Ala Glu Arg 



TTC AGC GGC 



GTC 905 
Val 



GTG 95 3 

Val 



ACC 1001 
Thr 



AAT 1049 

Asn 

260 

GGG 1097 
Gly 



ATG 114 5 

Met 



GCC 1193 
Ala 



CCC 1241 
Pro 



GCG 12 8 9 

Ala 

340 

GAA 1337 
Glu 



GAA 1385 
Glu 



ACT 1433 
Thr 



ACG 14 81 

Thr" 



ATC 1529 

He 

420 

CAG 1577 
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His Asn Ala Arg Asn Leu Ala Tyr Thr Leu Ala Asn Phe Ser Gly Gin 
425 430 435 

TAG AAA AAG CTA GGC GAA CAC TAT GAC AGC ATC ACA GCG GCG CTC TCT 162 5 

Tyr Lys Lys Leu Gly Glu His Tyr Asp Ser lie Thr Ala Ala Leu Ser 
440 445 450 

AGC TTG CCT GAT GCG CAA TCT TTA CAA AAT GTG GTG AGC AAA AAG ACT 1673 
Ser Leu Pro Asp Ala Gin Ser Leu Gin Asn Val Val Ser Lys Lys Thr 
455 460 465 

AAC CCT AAC AGC CCG CAA GGC ATA CAG GAT AAT TAC TAC ATT GAC TCC 1721 
Asn Pro Asn Ser Pro Gin Gly lie Gin Asp Asn Tyr Tyr lie Asp Ser 
470 475 480 

AAC ATC CAT TCT CAA GTG CAA TCT AGG AGT CAA GAA CTC GGC AGT AAC 1769 
Asn lie His Ser Gin Val Gin Ser Arg Ser Gin Glu Leu Gly Ser Asn 
485 490 495 500 

CCT TTC AGA CGC GCC GGG CTA ATC GCC GCT TCT ACC ACC AAT AAC GGC 1817 
Pro Phe Arg Arg Ala Gly Leu He Ala Ala Ser Thr Thr Asn Asn Gly 
505 510 515 

GCG ATG AAT GGG ATT GGC TTT CAA GTG GGC TAT AAG CAA TTC TTT GGG 18 65 

Ala Met Asn Gly He Gly Phe Gin Val Gly Tyr Lys Gin Phe Phe Gly 
520 525 530 

AAA AAC AAA CGA TGG GGC GCG AGA TAC TAC GGC TTT GTG GAT TAC AAC 1913 
Lys Asn Lys Arg Trp Gly Ala Arg Tyr Tyr Gly Phe Val Asp Tyr Asn 
535 540 545 

CAC ACC TAT AAC AAG TCC CAA TTT TTC AAC TCC GAT TCT GAT GTT TGG 1961 
His Thr Tyr Asn Lys Ser Gin Phe Phe Asn Ser Asp Ser Asp Val Trp 
550 555 560 

ACT TAT GGC GTG GGG AGC GAT TTG TTA GTG AAT TTC ATC AAC GAT AAA 2 00 9 

Thr Tyr Gly Val Gly Ser Asp Leu Leu Val Asn Phe He Asn Asp Lys 
565 570 575 580 

GCC ACT AAA CAC AAT AAA ATT TCT TTT GGC GCG TTT GGC GGT ATC CAA 2 05 7 

Ala Thr Lys His Asn Lys He Ser Phe Gly Ala Phe Gly Gly He Gin 
585 590 595 

CTA GCC GGG ACT TCA TGG CTT AAT TCT CAG TAT GTG AAT TTA GCG AAT 210 5 

Leu Ala Gly Thr Ser Trp Leu Asn Ser Gin Tyr Val Asn Leu Ala Asn 
600 605 610 

GTG AAC AAT TAT TAT AAA GCT AAA ATC AAC ACC TCT AAC TTC CAA TTC 215 3 

Val Asn Asn Tyr Tyr Lys Ala Lys He Asn Thr Ser Asn Phe Gin Phe 
615 620 625 

TTA TTC AAT CTG GGC TTA AGG ACC AAT CTC GCC AGA AAT AAA AGA ATA 22 01 

Leu Phe Asn Leu Gly Leu Arg Thr Asn Leu Ala Arg Asn Lys Arg He 
630 635 640 

GGC GCT GAT CAT AGC GCG CAA CAT GGC ATG GAA TTA GGC GTG AAG ATC 224 9 

Gly Ala Asp His Ser Ala Gin His Gly Met Glu Leu Gly Val Lys He 
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645 650 655 660 

CCC ACG ATC AAC ACA AAT TAC TAT TCT TTG CTA GGC ACT ACC TTG CAA 2 2 97 

Pro Thr lie Asn Thr Asn Tyr Tyr Ser Leu Leu Gly Thr Thr Leu Gin 
665 670 675 

TAC AGA AGG CTT TAT AGC GTG TAT CTC AAC TAT GTG TTT GCT TAC TAAAA 234 7 
Tyr Arg Arg Leu Tyr Ser Val Tyr Leu Asn Tyr Val Phe Ala Tyr 
680 685 690 

GCTTAAACTC CTTTTTAAAC TCCCTTTTTA GGGGGTTTAA TCTTTTTAAC TGACTTTTCT 24 07 

TTTAG CTTTT TTTAATTTTT TCCACCAAAC AAAGTTTTTT GACTTCAAGC GTTAATCACA 2 4 67 

AAAAATACTC AAAGGCGTTT TTTGCAATCT AAATAAAAAA TTAGCGTTAT TCAAGCGATC 2 52 7 

ATTTTAAACC ACCCAAGCAA GAAACCCCAA ACATCTTTAG CGTTCGCGCG CTCCACTAAC 2 587 

CAAAAAACGC CCCAAA 2 603 

(2) INFORMATION FOR SEQ ID NO : 8 : 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 711 -ami-no acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: protein 
(v) FRAGMENT TYPE: internal 
(ix) FEATURE: 

(A) NAME/ KEY : Signal Sequence 

(B) LOCATION: 1 . . .20 
(D) OTHER INFORMATION: 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO : 8 : 

Met Lys Lys Lys Phe Leu Ser Leu Thr Leu Gly Ser Leu Leu Val Ser 

-20 -15 -10 -5 

Ala Leu Ser Ala Glu Asp Asn Gly Phe Phe Val Ser Ala Gly Tyr Gin 

15 10 

lie Gly Glu Ser Ala Gin Met Val Lys Asn Thr Lys Gly lie Gin Asp 

15 20 25 

Leu Ser Asp Ser Tyr Glu Arg Leu Asn Asn Leu Leu Thr Ser Tyr Ser 

30 35 40 

Ala Leu Asn Thr Leu He Arg Gin Ser Ala Asp Pro Asn Ala He Asn 

45 50 55 60 

Asn Ala Arg Gly Asn Leu Asn Ala Ser Ala Lys Asn Leu He Asn Asp 

65 70 75 

Lys Lys Asn Ser Pro Ala Tyr Gin Ala Val Leu Leu Ala Leu Asn Ala 

80 85 90 

Ala Ala Gly Leu Trp Gin Val Met Ser Tyr Ser He Ser Val Cys Gly 

95 100 105 

Pro Gly Ser Asp Lys Asn Lys Asn Gly Gly Val Gin Thr Phe Glu Asn 

110 115 120 

Val Pro Ser Asn Gly Gly Thr Thr He Ala Cys Asp Ser Phe Tyr Glu 

125 130 135 140 

Pro Gly Lys Trp Ser Gly He Ser Thr Glu Asn Tyr Ala Lys He Asn 



SUBSTITUTE SHEET (RULE 26) 

BNSOOCID: <WO 9843479A 1_l_> 



WO 98/43479 



PCT/US98/06421 



Lys Ala Tyr Gin 
160 

lie Pro Ala Leu 

175 

Lys Lys Asn Asp 
190 

Thr Asn Gly Lys 
205 

Glu lie Lys Glu 

Lys Gin Ala Ser 
240 

Leu Ser Asn Gly 
255 

Ala Gly lie Asp 
270 

Glu lie Ser Ala 
285 

Glu Gin Ser Lys 

Asp Thr Gly Lys 
320 

Ser Met Phe Ala 
335 

Gin Ala Val Val 
350 

Asp Ser Leu Gly 
365 

Gly Thr Pro Ser 

Ala Tyr Val Gly 
400 

Phe Gly Asp Gin 
415 

Thr Leu Ala Asn 
430 

Asp Ser lie Thr 
445 

Gin Asn Val Val 

Gin Asp Asn Tyr 
480 

Arg Ser Gin Glu 
495 

Ala Ala Ser Thr 
510 

Val Gly Tyr Lys 
525 

Tyr Tyr Gly Phe 

Phe Asn Ser Asp 
560 

Leu Val Asn Phe 
575 

Phe Gly Ala Phe 
590 



145 

He He Gin Lys 

Ser Asp Thr Lys 
180 

Ser Val Gin Pro 
195 

Phe Val Ser Val 
210 

Asp He Lys Val 
225 

Thr He Leu Thr 

Gly Ala Gly Asn 
260 

Lys Gly Asp Gly 
275 

He Gin Asp Met 
290 

He Val Thr Ala 
305 

Ala Phe Asn Pro 

Asn Ala Arg Ala 
340 

Lys Asp Phe Glu 
355 

Val Cys His Glu 
370 

Gly Thr Val Thr 
385 

Glu Thr Val Thr 

Ala Glu Arg lie 
420 

Phe Ser Gly Gin 
435 

Ala Ala Leu Ser 
450 

Ser Lys Lys Thr 
465 

Tyr He Asp Ser 

Leu Gly Ser Asn 
500 

Thr Asn Asn Gly 
515 

Gin Phe Phe Gly 
530 

Val Asp Tyr Asn 
545 

Ser Asp Val Trp 

He Asn Asp Lys 
580 

Gly Gly He Gin 
595 



150 

Ala Phe Gly Ala 
165 

Glu Leu Asn Phe 

Gly Glu Arg Trp 
200 

Lys Trp Val Asn 
215 

Ser Asn Asn Ala 
230 

Thr Leu Asn Glu 
245 

Val Ala Gly Gly 

Ser Ala Cys Gly 
280 

He Lys Asn Ala 
295 

Asn Ala Gin Asn 
310 

Tyr Lys Asp Ala 
325 

Gin Ala Glu He 

Arg He Pro Ala 
360 

Lys Gly Ser Asp 
375 

Ser Asn Thr Trp 
390 

Asn Leu Lys Asn 
405 

His Asn Ala Arg 

Tyr Lys Lys Leu 
440 

Ser Leu Pro Asp 
455 

Asn Pro Asn Ser 
470 

Asn He His Ser 
485 

Pro Phe Arg Arg 

Ala Met Asn Gly 
520 

Lys Asn Lys Arg 
535 

His Thr Tyr Asn 
550 

Thr Tyr Gly Val 
565 

Ala Thr Lys His 

Leu Ala Gly Thr 
600 



155 

Ser Gly Gin Asp 
170 

Glu He Lys Gly 
185 

Lys Phe Pro Trp 

Gly Lys Tyr Glu 
220 

Gin Glu Leu Leu 
235 

Ala Cys Pro Trp 
250 

Asn Ser Leu Trp 
265 

He Phe Lys Asn 

Glu He Ala Val 
300 

Gin His Asn Leu 
315 

Asn Phe Ala Gin 
330 

Leu Asn Arg Ala 
345 

Ala Phe Val Lys 

Gly Asn Leu Arg 
380 

Gly Ala Gly Cys 
395 

Ser He Ala His 
410 

Asn Leu Ala Tyr 
425 

Gly Glu His Tyr 

Ala Gin Ser Leu 
460 

Pro Gin Gly He 
475 

Gin Val Gin Ser 
490 

Ala Gly Leu He 
505 

He Gly Phe Gin 

Trp Gly Ala Arg 
540 

Lys Ser Gin Phe" 
555 

Gly Ser Asp Leu 
570 

Asn Lys He Ser 
585 

Ser Trp Leu Asn 



SUBSTITUTE SHEET (RULE 26) 

BNSDOCID: <WO 9843479A1J_> 



WO 98/43479 101 ^^PCT/US98/06421 

Ser Gin Tyr Val Asn Leu Ala Asn Val Asn Asn Tyr Tyr Lys Ala Lys 
605 610 615 620 

lie Asn Thr Ser Asn Phe Gin Phe Leu Phe Asn Leu Gly Leu Arg Thr 

625 630 635 

Asn Leu Ala Arg Asn Lys Arg He Gly Ala Asp His Ser Ala Gin His 

640 645 650 

Gly Met Glu Leu Gly Val Lys He Pro Thr He Asn Thr Asn Tyr Tyr 

655 660 665 

Ser Leu Leu Gly Thr Thr Leu Gin Tyr Arg Arg Leu Tyr Ser Val Tyr 

670 675 680 

Leu Asn Tyr Val Phe Ala Tyr 
685 690 

(2) INFORMATION FOR SEQ ID NO : 9 : 

( i ) SEQUENCE CHARACTERISTICS : 

(A) LENGTH: 2427 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 
CD) TOPOLOGY: linear 

(ii) MOLECULE TYPE: Genomic DNA 
(ix) FEATURE: 

(A) NAME /KEY : Coding Sequence 

(B) LOCATION: 232... 2247 
(D) OTHER INFORMATION: 



(A) NAME /KEY : Signal Sequence 

(B) LOCATION: 232... 292 
(D) OTHER INFORMATION: 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO : 9 : 

AAAACGCGCA GCAAAAAATC TCTGTTAAGC TTTTATCATT AGCGTTCCAT TGAAACAAAA 6 0 

TCTAAAAACC CTTTCCAATA CCACCCAAAC AAACGCGCAA AAAATGCAAA AATTCTAAAT 12 0 

TTTCTCCAAA TGACAAAAAA AAAAAAAACG ATTTTATGCT ACAATGCTTT TAATACATTC 18 0 

TTACTTAATG TATAAAATCT CAATCACTCA ATTTAATTTC AAAGGATATT T ATG AAA 237 

Met Lys 
-20 

AAA ACC CTT TTA CTC TCT CTC TCT CTC TCT CTC TCG TCA TCG CTT TTA 2 85 

Lys Thr Leu Leu Leu Ser Leu Ser Leu Ser Leu Ser Ser Ser Leu Leu 
-15 -10 -5 

AAC GCT GAA GAC AAC GGC TTT TTT ATC AGC GCG GGC TAT CAA ATC GGT~ " 333 
Asn Ala Glu Asp Asn Gly Phe Phe He Ser Ala Gly Tyr Gin He Gly 
15 10 

GAA GCC GCT CAA ATG GTG AAA AAC ACC GGC GAA TTG AAA AAA CTT TCA 3 81 

Glu Ala Ala Gin Met Val Lys Asn Thr Gly Glu Leu Lys Lys Leu Ser 
15 20 25 30 
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GAC ACT TAT 
Asp Thr Tyr 



AAT CAA GCG 
Asn Gin Ala 



ATC GAT AAT 

lie Asp Asn 
65 

AAT TCC CCG 

Asn Ser Pro 
80 

GGG CTG TGG 

Gly Leu Trp 
95 

AGT GGG GAT 

Ser Gly Asp 



AGA TCC ATT 
Arg Ser lie 



CCT TTA TCC 
Pro Leu Ser 
145 

ATC CAA CAA 
lie Gin Gin 
160 

AAA GGA AAA 
Lys Gly Lys 
175 

AAT AAA AGT 
Asn Lys Ser 



ACC CTT TTG 
Thr Leu Leu 



TGC CCA TGG 
Cys Pro Trp 
225 

AAT TTA AAT 
Asn Leu Asn 
240 

AGC GCC GTT 



GAG AAT TTG 

Glu Asn Leu 
35 

GTA ACG AAC 

Val Thr Asn 
50 

TTA AAA GCA 

Leu Lys Ala 



GCG TAT CAA 
Ala Tyr Gin 



AAT GTG ATA 
Asn Val lie 
100 

CAA AGC GTA 
Gin Ser Val 
115 

AAT TGC AAT 
Asn Cys Asn 
130 

ATT GAC AAT 
lie Asp Asn 

GCT TTA AAA 
Ala Leu Lys 

CAA GTA ACT 
Gin Val Thr 
180 

GAA ACT ACT 
Glu Thr Thr 
195 

CAA GAA GCC 
Gin Glu Ala 
210 

GTA AAT ACC 
Val Asn Thr 



ACG ACA GGG 
Thr Thr Gly 



ACT AGC ATG 



AGC AAC CTT 
Ser Asn Leu 



GCG AGC AGC 
Ala Ser Ser 
55 

AAC ACG CAA 
Asn Thr Gin 
70 

GCG GTG TAT 
Ala Val Tyr 
85 

GCC TAT AAT 
Ala Tyr Asn 



ATT TTT GAT 
lie Phe Asp 



TTA ACC GGT 
Leu Thr Gly 
135 

TTT AAA ACG 
Phe Lys Thr 
150 

CAA GAT AGC 
Gin Asp Ser 
165 

ATA AAA ATA 
lie Lys lie 



ACT ACT ACT 
Thr Thr Thr 



AGT AAA ATG 
Ser Lys Met 
215 

GCT CAT AAC 
Ala His Asn 
230 

AAT GTG TGT 
Asn Val Cys 
245 

ATC AAA AAC 



TTA ACC AAT 
Leu Thr Asn 
40 

CCT TCA GAA 
Pro Ser Glu 



GGG CTG ATT 
Gly Leu lie 



TTG GCG CTC 
Leu Ala Leu 
90 

GTC CAA TGC 
Val Gin Cys 
105 

GGC CAA CCA 
Gly Gin Pro 
120 

TAT AAC AAC 
Tyr Asn Asn 



CTT AAT CAA 
Leu Asn Gin 



GGA TTT CCT 
Gly Phe Pro 
170 

ACA ACA CAA 
Thr Thr Gin 
185 

ACT ACT ACT 
Thr Thr Thr 
200 

ATA AGC GTC 
lie Ser Val 



TCA AAC GGG 
Ser Asn Gly 



CAG GTT TTT 
Gin Val Phe 
250 

GCG CAA GAA 



TTT AAC AAC 
Phe Asn Asn 
45 

ATC AAT GCC 
lie Asn Ala 
60 

GGC GAA AAA 
Gly Glu Lys 
75 

AAT GCG GCG 
Asn Ala Ala 



GGT CCT GGT 
Gly Pro Gly 



GGA CAT GAT 
Gly His Asp 
125 

GGG GTT AGC 
Gly Val Ser 
140 

GCT TAT CAA 
Ala Tyr Gin 
155 

GTT TTG GAT 
Val Leu Asp 



ACT AAT GGA 
Thr Asn Gly 



AAT GAC GCT 
Asn Asp Ala 
205 

CTC ACT ACA 
Leu Thr Thr 
220 

GGT GCA CCG 
Gly Ala Pro 
235 

GCC ACG GAG 
Ala Thr Glu 



ATC GTA ACG 



CTC 4 2 9 

Leu 



ACG 4 7 7 

Thr 



ACC 52 5 

Thr 



GTG 573 . 

Val 



AAG 621 

Lys 

110 

TCA 669 
Ser 



GGC 717 
Gly 



ACT 765 
Thr 



AGT 813 
Ser 



GCT 861 

Ala 

190 

CAA 909 
Gin 



AAC 957 
Asn 



TGG 1005 
Trp 



TTT 1053 
Phe 



CAA 1101 
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Ser Ala Val Thr Ser Met lie Lys Asn Ala Gin Glu He Val Thr Gin 
255 260 265 270 

GCT CAA AGC CTT AAC AAC CCG CAA AGC AAT CAA AAC GCG CCG AAA GAT 114 9 

Ala Gin Ser Leu Asn Asn Pro Gin Ser Asn Gin Asn Ala Pro Lys Asp 
275 280 285 

TTC AAT CCT TAC ACC TCT GCT GAT AGG GCT TTC GCT CAA AAC ATG CTC 1197 
Phe Asn Pro Tyr Thr Ser Ala Asp Arg Ala Phe Ala Gin Asn Met Leu 
290 295 300 

AAT CAC GCG CAA GCG CAA GCC AAG ATG CTT GAA CTA GCC GAT CAA ATG 12 4 5 

Asn His Ala Gin Ala Gin Ala Lys Met Leu Glu Leu Ala Asp Gin Met 
305 310 315 

AAA AAA GAC CTT AAC ACT ATC CCA AAA CAA TTT ATC ACA AAC TAC TTG 12 93 

Lys Lys Asp Leu Asn Thr He Pro Lys Gin Phe He Thr Asn Tyr Leu 
320 325 330 

GCA GCT TGC CGC AAT GGG GGT GGG ACA TTA CCT GAT GCA GGG GTT ACT 13 41 

Ala Ala Cys Arg Asn Gly Gly Gly Thr Leu Pro Asp Ala Gly Val Thr 
335 340 345 350 

TCT AAC ACT TGG GGG GCC GGT TGC GCC TAT GTG GAA GAG ACG ATA ACC 13 8 9 

Ser Asn Thr Trp Gly Ala Gly Cys Ala Tyr Val Glu Glu Thr He Thr 
355 360 365 

GCC CTA AAT AAC AGC CTT GCG CAT TTT GGC ACT CAA GCC GAT CAA ATC 14 37 

Ala Leu Asn Asn Ser Leu Ala His Phe Gly Thr Gin Ala Asp Gin He 
370 375 380 

AAG CAA TCT GAG TTG TTG GCG CGC ACG ATA CTT GAT TTT AGA GGC AGC 14 85 

Lys Gin Ser Glu Leu Leu Ala Arg Thr He Leu Asp Phe Arg Gly Ser 
385 390 395 

CTT AAG GAT TTA AAC AAC ACT TAT AAC AGC ATC ACC ACG ACC GCT TCA 15 3 3 

Leu Lys Asp Leu Asn Asn Thr Tyr Asn Ser lie Thr Thr Thr Ala Ser 
400 405 410 

AAC ACG CCC AAT TCC CCA TTC CTT AAA AAT TTG ATA AGC CAA TCC ACT 15 81 

Asn Thr Pro Asn Ser Pro Phe Leu Lys Asn Leu He Ser Gin Ser Thr 
415 420 425 430 

AAC CCT AAT AAC CCC GGG GGC TTA CAG GCC GTT TAT CAA GTC AAC CAA 162 9 

Asn Pro Asn Asn Pro Gly Gly Leu Gin Ala Val Tyr Gin Val Asn Gin 
435 440 445 

AGC GCT TAT TCG CAA TTA TTA AGC GCC ACG CAA GAA TTA GGG CAT AAC 1677 
Ser Ala Tyr Ser Gin Leu Leu Ser Ala Thr Gin Glu Leu Gly His Asn 
450 455 460 

CCT TTC AGA CGC GTT GGC TTA ATC AGC TCT CAA ACC AAC AAC GGT GCG 172 5 

Pro Phe Arg Arg Val Gly Leu He Ser Ser Gin Thr Asn Asn Gly Ala 
465 470 475 

ATG AAT GGG ATC GGC GTG CAA ATA GGG TAT AAA' CAA TTT TTT GGT GAA 17 73 

Met Asn Gly He Gly Val Gin He Gly Tyr Lys Gin Phe Phe Gly Glu 



SUBSTITUTE SHEET (RULE 26) 

BNSDOC1D: <WO 9843479A 1_l_> 



WO 98/43479 



PCT/US98/06421 



480 485 490 

AAA AGA AGA TGG GGG TTA AGG TAT TAT GGT TTT TTT GAT TAC AAC CAT 1821 

Lys Arg Arg Trp Gly Leu Arg Tyr Tyr Gly Phe Phe Asp Tyr Asn His 

495 500 505 510 

GCT TAT ATC AAA TCC AGC TTT TTC AAC TCC GCC TCT GAT GTG TTC ACT 18 69 

Ala Tyr lie Lys Ser Ser Phe Phe Asn Ser Ala Ser Asp Val Phe Thr 
515 520 525 



TAT GGG GTA GGA ACA GAT GTC CTC TAT AAC TTT ATC AAC GAT AAA GCC 1917 
Tyr Gly Val Gly Thr Asp Val Leu Tyr Asn Phe lie Asn Asp Lys Ala 
530 535 540 

ACC AAA AAC AAT AAG ATT TCT TTT GGG GTG TTT GGG GGG ATT GCG TTA 1965 
Thr Lys Asn Asn Lys lie Ser Phe Gly Val Phe Gly Gly lie Ala Leu 
545 550 555 



GCT GGC ACT TCG TGG CTT AAT TCT CAA TAC GTG AAT TTA GCG ACA TTC 2 013 

Ala Gly Thr Ser Trp Leu Asn Ser Gin Tyr Val Asn Leu Ala Thr Phe 
560 565 570 



AAT AAT TTT TAC AGC GCT AAA ATG 
Asn Asn Phe Tyr Ser Ala Lys Met 
575 580 

TTC AAC TTG GGC TTG AGA ATG AAT 
Phe Asn Leu Gly Leu Arg Met Asn 
595 



AAT GTG GCG AAT TTC CAA TTC TTA 2 061 

Asn Val Ala Asn Phe Gin Phe Leu 
585 590 

CTC GCT AAA AAC AAA AAG AAA GCG 2109 
Leu Ala Lys Asn Lys Lys Lys Ala 
600 605 



AGC GAT CAT GTA GCT CAG CAT GGC GTG GAA CTA GGC GTG AAG ATC CCT 215 7 

Ser Asp His Val Ala Gin His Gly Val Glu Leu Gly Val Lys lie Pro 
610 615 620 



ACG ATC AAC ACG AAT TAC TAT TCT TTG CTA GGC ACT CAA CTC CAA TAC 22 0 5 

Thr lie Asn Thr Asn Tyr Tyr Ser Leu Leu Gly Thr Gin Leu Gin Tyr 
625 630 635 



CGC AGG CTT TAT AGC GTG TAT TTG AAT TAT GTG TTT GCT TAC TAATATCTG 22 56 
Arg Arg Leu Tyr Ser Val Tyr Leu Asn Tyr Val Phe Ala Tyr 
640 645 650 



TCTTTTTGTG AAACTCCCTT TTTAAGGGAT TTTTTTTGAA GCCTTTCTTT TTTTAAACCC 2 316 
TCTTTTTTGG GGGTCAAGCG TAAAATT C AC CCCTATCCCT TTAAGAAAAT AAAATAAAAG 2 3 76 
AAAATGCGTT TTATAACAAA ATAAGATCTA AAACAATAAA AC AAAAAC C C A 24 2 7 



(2) INFORMATION FOR SEQ ID NO: 10: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 672 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS : single 

( D ) TOPOLOGY : 1 inear 



(ii) MOLECULE TYPE: protein 
(v) FRAGMENT TYPE: internal 
(ix) FEATURE: 
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(A) NAME/ KEY : Signal Sequence 

(B) LOCATION: 1 ... 20 
(D) OTHER INFORMATION: 



<xi) SEQUENCE DESCRIPTION: SEQ ID NO: 10: 

Met Lys Lys Thr Leu Leu Leu Ser Leu Ser Leu Ser Leu Ser Ser Ser 
-20 -15 -10 -5 

Leu Leu Asn Ala Glu Asp Asn Gly Phe Phe lie Ser Ala Gly Tyr Gin 

15 10 
He Gly Glu Ala Ala Gin Met Val Lys Asn Thr Gly Glu Leu Lys Lys 

15 20 25 

Leu Ser Asp Thr Tyr Glu Asn Leu Ser Asn Leu Leu Thr Asn Phe Asn 

30 35 40 

Asn Leu Asn Gin Ala Val Thr Asn Ala Ser Ser Pro Ser Glu He Asn 
45 50 55 60 

Ala Thr lie Asp Asn Leu Lys Ala Asn Thr Gin Gly Leu lie Gly Glu 

65 70 75 

Lys Thr Asn Ser Pro Ala Tyr Gin Ala Val Tyr Leu Ala Leu Asn Ala 

80 85 90 

Ala Val Gly Leu Trp Asn Val He Ala Tyr Asn Val Gin Cys Gly Pro 

95 100 105 

Gly Lys Ser Gly Asp Gin Ser Val He Phe Asp Gly Gin Pro Gly His 

110 115 120 

Asp Ser Arg Ser He Asn Cys Asn Leu Thr Gly Tyr Asn Asn Gly Val 
125 130 135 140 

Ser Gly Pro Leu Ser He Asp Asn Phe Lys Thr Leu Asn Gin Ala Tyr 

145 150 155 

Gin Thr He Gin Gin Ala Leu Lys Gin Asp Ser Gly Phe Pro Val Leu 

160 165 170 

Asp Ser Lys Gly Lys Gin Val Thr He Lys He Thr Thr Gin Thr Asn 

175 180 185 

Gly Ala Asn Lys Ser Glu Thr Thr Thr Thr Thr Thr Thr Thr Asn Asp 

190 195 200 

Ala Gin Thr Leu Leu Gin Glu Ala Ser Lys Met He Ser Val Leu Thr 
205 210 215 220 

Thr Asn Cys Pro Trp Val Asn Thr Ala His Asn Ser Asn Gly Gly Ala 

225 230 235 

Pro Trp Asn Leu Asn Thr Thr Gly Asn Val Cys Gin Val Phe Ala Thr 

240 245 250 

Glu Phe Ser Ala Val Thr Ser Met He Lys Asn Ala Gin Glu lie Val 

255 260 265 

Thr Gin Ala Gin Ser Leu Asn Asn Pro Gin Ser Asn Gin Asn Ala Pro 

270 275 280 

Lys Asp Phe Asn Pro Tyr Thr Ser Ala Asp Arg Ala Phe Ala Gin Asn 
285 290 295 300 

Met Leu Asn His Ala Gin Ala Gin Ala Lys Met Leu Glu Leu Ala Asp 

305 310 315 

Gin Met Lys Lys Asp Leu Asn Thr He Pro Lys Gin Phe lie Thr Asn 

320 325 330 

Tyr Leu Ala Ala Cys Arg Asn Gly Gly Gly Thr Leu Pro Asp Ala Gly 

335 340 345 

Val Thr Ser Asn Thr Trp Gly Ala Gly Cys Ala Tyr Val Glu Glu Thr 

350 355 360 

He Thr Ala Leu Asn Asn Ser Leu Ala His Phe Gly Thr Gin Ala Asp 
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365 370 375 380 

Gin lie Lys Gin Ser Glu Leu Leu Ala Arg Thr lie Leu Asp Phe Arg 

385 390 395 

Gly Ser Leu Lys Asp Leu Asn Asn Thr Tyr Asn Ser He Thr Thr Thr 

400 405 410 

Ala Ser Asn Thr Pro Asn Ser Pro Phe Leu Lys Asn Leu He Ser Gin 

415 420 425 

Ser Thr Asn Pro Asn Asn Pro Gly Gly Leu Gin Ala Val Tyr Gin Val 

430 435 440 

Asn Gin Ser Ala Tyr Ser Gin Leu Leu Ser Ala Thr Gin Glu Leu Gly 
445 450 455 460 

His Asn Pro Phe Arg Arg Val Gly Leu He Ser Ser Gin Thr Asn Asn 

465 470 475 

Gly Ala Met Asn Gly He Gly Val Gin He Gly Tyr Lys Gin Phe Phe 

480 485 490 

Gly Glu Lys Arg Arg Trp Gly Leu Arg Tyr Tyr Gly Phe Phe Asp Tyr 

495 500 505 

Asn His Ala Tyr He Lys Ser Ser Phe Phe Asn Ser Ala Ser Asp Val 

510 515 520 

Phe Thr Tyr Gly Val Gly Thr Asp Val Leu Tyr Asn Phe He Asn Asp 
525 530 535 540 

Lys Ala Thr Lys Asn Asn Lys He Ser Phe Gly Val Phe Gly Gly He 

545 550 555 

Ala Leu Ala Gly Thr Ser Trp Leu Asn Ser Gin Tyr Val Asn Leu Ala 

560 565 570 

Thr Phe Asn Asn Phe Tyr Ser Ala Lys Met Asn Val Ala Asn Phe Gin 

575 580 585 

Phe Leu Phe Asn Leu Gly Leu Arg Met Asn Leu Ala Lys Asn Lys Lys 

590 595 600 

Lys Ala Ser Asp His Val Ala Gin His Gly Val Glu Leu Gly Val Lys 
605 610 615 620 

He Pro Thr He Asn Thr Asn Tyr Tyr Ser Leu Leu Gly Thr Gin Leu 

625 630 635 

Gin Tyr Arg Arg Leu Tyr Ser Val Tyr Leu Asn Tyr Val Phe Ala Tyr 
640 645 650 

(2) INFORMATION FOR SEQ ID NO: 11: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 2429 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: Genomic DNA 
(ix) FEATURE: 

(A) NAME/ KEY : Coding Sequence 

(B) LOCATION: 205... 2277 
(D) OTHER INFORMATION: 



(A) NAME/ KEY : Signal Sequence 

(B) LOCATION: 205... 259 
(D) OTHER INFORMATION: 
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(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 11: 

TGAAAGAAGA CTGATTAGTC TTTCTTTTAG GGGCGATTCA AGCCTTAAAA GCCGGGTCAA 6 0 

AATCCCCATT TTTCCCAATT TTTACAAAAA AAAAAAAAAC AAAATCTCTA AAATTTAGAG 12 0 

CTAAAATTAG CCATAAAATT CCATTTATTG CTTATAATAT GAAGTTTCTT TGTATCAAAG 18 0 

AAAAATCTAT TAAAAGGAGA AAAC ATG AAA AAA TCC CTC TTA CTC TCT CTT 231 

Met Lys Lys Ser Leu Leu Leu Ser Leu 
-15 -10 

TCT CTC ATC GCT TCC TTA TCA AGA GCT GAA GAT GAC GGA TTT TAT ACG 2 79 

Ser Leu He Ala Ser Leu Ser Arg Ala Glu Asp Asp Gly Phe Tyr Thr 

-5 15' 

AGT GTG GGC TAT CAG ATC GGT GAA GCG GTC CAA CAA GTG AAA AAC ACA 32 7 

Ser Val Gly Tyr Gin He Gly . Glu Ala Val Gin Gin Val Lys Asn Thr 
10 15 20 

GGA GCA TTG CAA AAT CTT GCA GAC AGA TAC GAT AAC TTA AAC AAC CTT 3 75 

Gly Ala Leu Gin Asn Leu Ala Asp Arg Tyr Asp Asn Leu Asn Asn Leu 
25 30 35 

TTA AAC CAA TAC AAT TAT TTA AAT TCC TTA GTC AAT TTA GCC AGC ACG 4 23 

Leu Asn Gin Tyr Asn Tyr Leu Asn Ser Leu Val Asn Leu Ala Ser Thr 
40 45 50 55 

CCG AGC GCG ATC ACC GGT GCG ATT GAT AAT TTA AGC TCA AGC GCG ATT 4 71 

Pro Ser Ala He Thr Gly Ala He Asp Asn Leu Ser Ser Ser Ala He 
60 65 70 

AAC CTC ACT AGC GCC ACC ACC ACT TCC CCC GCC TAT CAA GCT GTG GCT 519 
Asn Leu Thr Ser Ala Thr Thr Thr Ser Pro Ala Tyr Gin Ala Val Ala 
75 80 85 

TTA GCG CTC AAT GCC GCT GTG GGC ATG TGG CAA GTC ATA GCC CTT TTT 5 67 

Leu Ala Leu Asn Ala Ala Val Gly Met Trp Gin Val He Ala Leu Phe 
90 95 100 

ATT GGC TGT GGC CCT GGC CCT ACC AAT AAT CAA AGC TAT CAA TCG TTT 615 
He Gly Cys Gly Pro Gly Pro Thr Asn Asn Gin Ser Tyr Gin Ser Phe 
105 110 115 

GGT AAC ACA CCA GCC CTT AAT GGG ACC ACC ACC ACT TGC AAT CAA GCA 6 63 

Gly Asn Thr Pro Ala Leu Asn Gly Thr Thr Thr Thr Cys Asn Gin Ala 
120 125 130 135 

TAT GGG ACA GGC CCT AAT GGC ATC CTA TCT ATT GAT GAA TAC CAA AAA 711 
Tyr Gly Thr Gly Pro Asn Gly He Leu Ser He Asp Glu Tyr Gin Lys 
140 145 150 

CTC AAC CAA GCT TAT CAG ATC ATC CAA ACC GCT TTA AAC CAA AAT CAA 75 9 

Leu Asn Gin Ala Tyr Gin He He Gin Thr Ala Leu Asn Gin Asn Gin 
155 160 165 

GGG GGT GGG ATG CCT GCC TTG AAT GAC ACC ACC AAA ACA GGG GTA GTC 80 7 

Gly Gly Gly Met Pro Ala Leu Asn Asp Thr Thr Lys Thr Gly Val Val 
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170 175 180 

AAC ATA CAA CAA ACC AAT TAT AGG ACC ACC ACA CAA AAC AAT ATC ATA 8 55 

Asn lie Gin Gin Thr Asn Tyr Arg Thr Thr Thr Gin Asn Asn lie lie 
185 190 195 

GAG CAT TAT TAT ACA GAG AAT GGG AAA GAG ATC CCA GTC TCT TAT TCA 903 
Glu His Tyr Tyr Thr Glu Asn Gly Lys Glu lie Pro Val Ser Tyr Ser 
200 205 210 215 

GGC GGA TCA TCA TTC TCG CCT ACA ATA CAA TTG ACA TAC CAT AAT AAC 951 
Gly Gly Ser Ser Phe Ser Pro Thr lie Gin Leu Thr Tyr His Asn Asn 
220 225 230 

GCT GAA AAC CTT TTG CAA CAA GCC GCC ACT ATC ATG CAA GTC CTT ATT 999 
Ala Glu Asn Leu Leu Gin Gin Ala Ala Thr lie Met Gin Val Leu lie 
235 240 245 

ACT CAA AAG CCG CAT GTG CAA ACG AGC AAT GGC GGT AAA GCG TGG GGG 104 7 

Thr Gin Lys Pro His Val Gin Thr Ser Asn Gly Gly Lys Ala Trp Gly 
250 255 260 

TTG AGT TCT ACG CCT GGG AAT GTG ATG GAT ATT TTT GGT CCT TCT TTT 10 95 

Leu Ser Ser Thr Pro Gly Asn Val Met Asp lie Phe Gly Pro Ser Phe 
265 270 275 

AAC GCT ATT AAT GAG ATG ATT AAA AAC GCT CAA ACA GCC CTA GCA AAA 114 3 

Asn Ala lie Asn Glu Met lie Lys Asn Ala Gin Thr Ala Leu Ala Lys 
280 285 290 295 

ACC CAA CAG CTT AAC GCT AAT GAA AAC GCC CAA ATC ACG CAA CCC AAC 1191 
Thr Gin Gin Leu Asn Ala Asn Glu Asn Ala Gin lie Thr Gin Pro Asn 
300 305 310 

AAT TTC AAC CCC TAC ACC TCT AAA GAC AAA GGG TTC GCT CAA GAA ATG 12 3 9 

Asn Phe Asn Pro Tyr Thr Ser Lys Asp Lys Gly Phe Ala Gin Glu Met 
315 320 325 

CTC AAT AGA GCT GAA GCT CAA GCA GAG ATT TTA AAT TTA GCT AAG CAA 12 87 

Leu Asn Arg Ala Glu Ala Gin Ala Glu He Leu Asn Leu Ala Lys Gin 
330 335 340 

GTA GCG AAC AAT TTC CAC AGC ATT CAA GGG CCT ATT CAA GGG GAT TTA 13 3 5 

Val Ala Asn Asn Phe His Ser He Gin Gly Pro He Gin Gly Asp Leu 
345 350 355 

GAA GAA TGT AAA GCA GGA TCG GCT GGC GTG ATC ACT AAT AAC ACT TGG 13 8 3 

Glu Glu Cys Lys Ala Gly Ser Ala Gly Val He Thr Asn Asn Thr Trp 
360 365 370 375 

GGT TCA GGT TGC GCG TTT GTG AAA GAA ACT TTA AAC TCT TTA GAG CAA 14-31 
Gly Ser Gly Cys Ala Phe Val Lys Glu Thr Leu Asn Ser Leu Glu Gin 
380 385 390 

CAC ACC GCT TAT TAC GGC AAC CAG GTC AAT CAG GAT AGG GCT TTG GCT 14 7 9 

His Thr Ala Tyr Tyr Gly Asn Gin Val Asn Gin Asp Arg Ala Leu Ala 
395 400 405 



SUBSTITUTE SHEET (RULE 25) 



BNSDOCID: <WO 9843479A1J_> 



WO 98/43479 1 09 ^^>CT/US98/0642 1 

CAA ACC ATT TTG AAT TTT AAA GAA GCC CTT AAC ACC CTG AAT AAA GAC 152 7 

Gin Thr lie Leu Asn Phe Lys Glu Ala Leu Asn Thr Leu Asn Lys Asp 
410 415 420 

TCA AAA GCG ATC AAT AGC GGT ATC TCC AAC TTG CCT AAC GCT AAA TCT 157 5 

Ser Lys Ala lie Asn Ser Gly lie Ser Asn Leu Pro Asn Ala Lys Ser 
425 430 435 

CTT CAA AAC ATG ACG CAT GCC ACT CAA AAC CCT AAT TCC CCA GAA GGT 162 3 

Leu Gin Asn Met Thr His Ala Thr Gin Asn Pro Asn Ser Pro Glu Gly 
440 445 450 455 

CTG CTC ACT TAT TCT TTG GAT TCA AGC AAA TAC AAC CAG CTC CAA ACC 1671 
Leu Leu Thr Tyr Ser Leu Asp Ser Ser Lys Tyr Asn Gin Leu Gin Thr 
460 465 470 

ATC GCG CAA GAA TTG GGC AAA AAC CCT TTC AGG CGC TTT GGC GTG ATT 1719 
lie Ala Gin Glu Leu Gly Lys Asn Pro Phe Arg Arg Phe Gly Val lie 
475 480 485 

GAC TTT CAA AAC AAC AAC GGC GCA ATG AAC GGG ATC GGC GTG CAA GTG 17 67 

Asp Phe Gin Asn Asn Asn Gly Ala Met Asn Gly lie Gly Val Gin Val 
490 495 500 

GGT TAT AAA CAA TTC TTT GGT AAA AAA AGG AAT TGG GGG TTA AGG TAT 1815 
Gly Tyr Lys Gin Phe Phe Gly Lys Lys Arg Asn Trp Gly Leu Arg Tyr 
505 510 515 

TAT GGT TTC TTT GAT TAT AAC CAT GCT TAT ATC AAA TCT AAT TTT TTC 1863 
Tyr Gly Phe Phe Asp Tyr Asn His Ala Tyr lie Lys Ser Asn Phe Phe 
520 525 530 535 

AAC TCC GCT TCT GAT GTG TGG ACT TAT GGG GTG GGT ATG GAC GCT CTC 1911 
Asn Ser Ala Ser Asp Val Trp Thr Tyr Gly Val Gly Met Asp Ala Leu 
540 545 550 

TAT AAC TTC ATC AAC GAT AAA AAC ACC AAC TTT TTA GGC AAG AAC AAC 195 9 

Tyr Asn Phe lie Asn Asp Lys Asn Thr Asn Phe Leu Gly Lys Asn Asn 
555 560 565 

AAG CTT TCA GTA GGG CTT TTT GGA GGC TTT GCG TTA GCC GGG ACT TCG 2 007 

Lys Leu Ser Val Gly Leu Phe Gly Gly Phe Ala Leu Ala Gly Thr Ser 
570 575 580 

TGG CTT AAT TCC CAA CAA GTG AAT TTG ACC ATG ATG AAT GGC ATT TAT 2 0 55 

Trp Leu Asn Ser Gin Gin Val Asn Leu Thr Met Met Asn Gly lie Tyr 
585 590 595 

AAC GCT AAT GTC AGC ACT TCT AAC TTC CAA TTT TTG TTT GAT TTA GGC 2103 
Asn Ala Asn Val Ser Thr Ser Asn Phe Gin Phe Leu Phe Asp Leu Gly 
600 605 610 615 

TTG AGA ATG AAC CTC GCT AGG CCT AAG AAA AAA GAC AGC GAT CAT GCC 2151 
Leu Arg Met Asn Leu Ala Arg Pro Lys Lys Lys Asp Ser Asp His Ala 
620 625 630 

GCT CAG CAT GGC ATT GAA CTA GGT TTT AAG ATC CCC ACG ATC AAC ACC 2199 
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Ala Gin His Gly lie Glu Leu Gly Phe Lys lie Pro Thr lie Asn Thr 
635 640 645 



AAC TAT TAT TCT TTC ATG GGC GCT AAA CTA GAA TAC AGA AGG ATG TAT 224 7 

Asn Tyr Tyr Ser Phe Met Gly Ala Lys Leu Glu Tyr Arg Arg Met Tyr 

650 - 655 - 660 

AGC CTT TTT CTC AAT TAT GTG TTT GCT TAC TAAAAATTCT TTTTGAACCC CTC 2300 

Ser Leu Phe Leu Asn Tyr Val Phe Ala Tyr 

665 670 



TTTTTTTGGG GGAGTGTTGC AAAAATGCCC CCCTATTTGC TTGTGAGTTT TGGTTAAAAT 2 3 60 
TTTAGTTACC CACGCTTAAA AAGCGCCAAG CCTTTTACAC ACAACTCCTT TAATTTTGTT 24 2 Q 
TTTAAGAAA 24 2 9 



(2) INFORMATION FOR SEQ ID NO : 12 : 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 691 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 



(ii) MOLECULE TYPE: protein 
(v) FRAGMENT TYPE: internal 
(ix) FEATURE: 



(A) NAME/KEY: Signal Sequence 

(B) LOCATION: 1 ... 18 
(D) OTHER INFORMATION: 



(xi) SEQUENCE DESCRIPTION : SEQ ID NO : 12 : 



Met Lys Lys Ser Leu Leu Leu Ser Leu Ser Leu lie Ala Ser Leu Ser 

-15 -10 -5 

Arg Ala Glu Asp Asp Gly Phe Tyr Thr Ser Val Gly Tyr Gin lie Gly 

15 10 
Glu Ala Val Gin Gin Val Lys Asn Thr Gly Ala Leu Gin Asn Leu Ala 
15 20 25 30 

Asp Arg Tyr Asp Asn Leu Asn Asn Leu Leu Asn Gin Tyr Asn Tyr Leu 

35 40 45 

Asn Ser Leu Val Asn Leu Ala Ser Thr Pro Ser Ala lie Thr Gly Ala 

50 55 60 

lie Asp Asn Leu Ser Ser Ser Ala lie Asn Leu Thr Ser Ala Thr Thr 

65 70 75 

Thr Ser Pro Ala Tyr Gin Ala Val Ala Leu Ala Leu Asn Ala Ala Val 

80 85 90 

Gly Met Trp Gin Val lie Ala Leu Phe lie Gly Cys Gly Pro Gly Pro 
95 100 105 110 

Thr Asn Asn Gin Ser Tyr Gin Ser Phe Gly Asn Thr Pro Ala Leu Asn 

115 120 125 

Gly Thr Thr Thr Thr Cys Asn Gin Ala Tyr Gly Thr Gly Pro Asn Gly 

130 135 140 

lie Leu Ser lie Asp Glu Tyr Gin Lys Leu Asn Gin Ala Tyr Gin lie 
145 150 155 
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lie Gin Thr Ala Leu Asn Gin Asn Gin Gly Gly Gly Met Pro Ala Leu 

160 165 170 

Asn Asp Thr Thr Lys Thr Gly Val Val Asn lie Gin Gin Thr Asn Tyr 
175 180 185 190 

Arg Thr Thr Thr Gin Asn Asn lie lie Glu His Tyr Tyr Thr Glu Asn 

195 200 205 

Gly Lys Glu lie Pro Val Ser Tyr Ser Gly Gly Ser Ser Phe Ser Pro 

210 215 220 

Thr lie Gin Leu Thr Tyr His Asn Asn Ala Glu Asn Leu Leu Gin Gin 

225 230 235 

Ala Ala Thr lie Met Gin Val Leu lie Thr Gin Lys Pro His Val Gin 

240 245 250 

Thr Ser Asn Gly Gly Lys Ala Trp Gly Leu Ser Ser Thr Pro Gly Asn 
255 260 265 270 

Val Met Asp lie Phe Gly Pro Ser Phe Asn Ala lie Asn Glu Met lie 

275 280 285 

Lys Asn Ala Gin Thr Ala Leu Ala Lys Thr Gin Gin Leu Asn Ala Asn 

290 295 300 

Glu Asn Ala Gin He Thr Gin Pro Asn Asn Phe Asn Pro Tyr Thr Ser 

305 310 315 

Lys Asp Lys Gly Phe Ala Gin Glu Met Leu Asn Arg Ala Glu Ala Gin 

320 325 330 

Ala Glu He Leu Asn Leu Ala Lys Gin Val Ala Asn Asn Phe His Ser 
335 340 345 350 

He Gin Gly Pro He Gin Gly Asp Leu Glu Glu Cys Lys Ala Gly Ser 

355 360 365 

Ala Gly Val He Thr Asn Asn Thr Trp Gly Ser Gly Cys Ala Phe Val 

370 375 380 

Lys Glu Thr Leu Asn Ser Leu Glu Gin His Thr Ala Tyr Tyr Gly Asn 

385 390 395 

Gin Val Asn Gin Asp Arg Ala Leu Ala Gin Thr He Leu Asn Phe Lys 

400 405 410 

Glu Ala Leu Asn Thr Leu Asn Lys Asp Ser Lys Ala He Asn Ser Gly 
415 420 425 430 

He Ser Asn Leu Pro Asn Ala Lys Ser Leu Gin Asn Met Thr His Ala 

435 440 445 

Thr Gin Asn Pro Asn Ser Pro Glu Gly Leu Leu Thr Tyr Ser Leu Asp 

450 455 460 

Ser Ser Lys Tyr Asn Gin Leu Gin Thr He Ala Gin Glu Leu Gly Lys 

465 470 475 

Asn Pro Phe Arg Arg Phe Gly Val He Asp Phe Gin Asn Asn Asn Gly 

480 485 490 

Ala Met Asn Gly He Gly Val Gin Val Gly Tyr Lys Gin Phe Phe Gly 
495 500 505 510 

Lys Lys Arg Asn Trp Gly Leu Arg Tyr Tyr Gly Phe Phe Asp Tyr Asn 

515 520 525 

His Ala Tyr He Lys Ser Asn Phe Phe Asn .Ser Ala Ser Asp Val Trp 

530 535 540 

Thr Tyr Gly Val Gly Met Asp Ala Leu Tyr Asn Phe He Asn Asp Lys 

545 550 555 

Asn Thr Asn Phe Leu Gly Lys Asn Asn Lys Leu Ser Val Gly Leu Phe 

560 565 570 

Gly Gly Phe Ala Leu Ala Gly Thr Ser Trp Leu Asn Ser Gin Gin Val 
575 580 585 590 

Asn Leu Thr Met Met Asn Gly He Tyr Asn Ala Asn Val Ser Thr Ser 

595 600 605 

Asn Phe Gin Phe Leu Phe Asp Leu Gly Leu Arg Met Asn Leu Ala Arg 
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610 615 620 



Pro 


Lys 


Lys 


Lys 


Asp 


Ser 


Asp 


His 


Ala 


Ala 


Gin 


His 


Gly 


He 


Glu 


Leu 






625 










630 










635 








Gly 


Phe 


Lys 


lie 


Pro 


Thr 


lie 


Asn 


Thr 


Asn 


Tyr 


Tyr 


Ser 


Phe 


Met 


Gly 




640 










645 










650 










Ala 


Lys 


Leu 


Glu 


Tyr 


Arg 


Arg 


Met 


Tyr 


Ser 


Leu 


Phe 


Leu 


Asn 


Tyr 


Val 


655 










660 










665 










67 0 


Phe 


Ala 


Tyr 





























(2) INFORMATION FOR SEQ ID NO: 13: 

( i ) SEQUENCE CHARACTERISTICS : 

(A) LENGTH: 2270 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY : linear 

(ii) MOLECULE TYPE: Genomic DNA 
(ix) FEATURE: 

(A) NAME / KEY : Coding Sequence 

(B) LOCATION: 130... 2049 
(D) OTHER INFORMATION: 



(A) NAME /KEY : Signal Sequence 

(B) LOCATION: 130... 193 
(D) OTHER INFORMATION: 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 13: 

ATTGAGCGCA TCAAAACACC CTAAAACTTT TTTGAAATCC AATAAATTTA TGTTATAATT 6 0 

AAACGCATTG TAAATAAATT CTCATTTTGA TACATTTTTA CAATAAAACA TTACTTTAAG 12 0 

GAACATCTT ATG AAA AAA ACG AAA AAA ACG ATT CTG CTT TCT CTA ACT CTC 171 
Met Lys Lys Thr Lys Lys Thr He Leu Leu Ser Leu Thr Leu 
-20 -15 -10 

GCG GCG TCA TTG CTC CAT GCT GAA GAC AAC GGC GTT TTT TTA AGC GTG 219 
Ala Ala Ser Leu Leu His Ala Glu Asp Asn Gly Val Phe Leu Ser Val 
-5 15 

GGT TAT CAA ATC GGT GAA GCG GTT CAA AAA GTG AAA AAC GCC GAC AAG 2 67 

Gly Tyr Gin lie Gly Glu Ala Val Gin Lys Val Lys Asn Ala Asp Lys 
10 15 20 25 

GTG CAA AAA CTT TCA GAC ACT TAT GAA CAA TTA AGC CGG CTT TTA ACC " 315 
Val Gin Lys Leu Ser Asp Thr Tyr Glu Gin Leu Ser Arg Leu Leu Thr 
30 35 40 

AAC GAT AAT GGC ACA AAC TCA AAG ACA AGC GCG CAA ATC AAC CAA GCG 3 63 

Asn Asp Asn Gly Thr Asn Ser Lys Thr Ser Ala Gin lie Asn Gin Ala 
45 50 55 
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GTT AAT AAT 
Val Asn Asn 
60 

AAT TCC CCT 
Asn Ser Pro 
75 

GGG CTA TGG 
Gly Leu Trp 
90 

AAA AGT CCA 
Lys Ser Pro 

AAT GGC AAT 
Asn Gly Asn 

GGC ACT CAT 

Gly Thr His 
140 

GTT TCT CTA 

Val Ser Leu 
155 

ATT CTT TCA 

lie Leu Ser 
170 

AAA GGG GAA 

Lys Gly Glu 



AAT AGT CAA 
Asn Ser Gin 



GCG CAA AAT 
Ala Gin Asn 
220 

GAT TAT TGC 
Asp Tyr Cys 
235 

GCA GCT ACT 
Ala Ala Thr 
250 

AAT TCA TGT 
Asn Ser Cys 



ATT AAT AAT 



TTG AAC GAA 
Leu Asn Glu 



GCC TAT CAA 
Ala Tyr Gin 



AAT AGC ATG 
Asn Ser Met 
95 

GGC GAA AAC 
Gly Glu Asn 
110 

GGC ACT ACA 
Gly Thr Thr 

125 

AGT TCT AGT 
Ser Ser Ser 



TCT ATT GAG 
Ser lie Glu 



AAA GCT TTA 
Lys Ala Leu 
175 

AAG TTA GAA 
Lys Leu Glu 
190 

ACT AAA ACG 
Thr Lys Thr 
205 

CTT TTG ACT 
Leu Leu Thr 



CCC ATG TTG 
Pro Met Leu 



ACA AAC GCC 
Thr Asn Ala 
255 

GCG ACT TTT 
Ala Thr Phe 
270 

GCG CAA AAA 



CGC GCA AAA 

Arg Ala Lys 
65 

GCC ACG CTT 

Ala Thr Leu 
80 

GGT TAT GCG 

Gly Tyr Ala 

AAT CAA AAA 
Asn Gin Lys 

ATC AAT TGC 
lie Asn Cys 
130 

GGC ACA AAT 
Gly Thr Asn 
145 

CAA TAT GAA 
Gin Tyr Glu 
160 

AAA CAA GCC 
Lys Gin Ala 



GCG CAT GTA 
Ala His Val 



ACA ACT TCT 
Thr Thr Ser 
210 

CAA GCG CAA 
Gin Ala Gin 
225 

ATA GCG AAA 
lie Ala Lys 
240 

CCT TCA TGG 
Pro Ser Trp 



GGT GCG GAG 
Gly Ala Glu 



ATC GTT CAA 



ACT TTA GCC 
Thr Leu Ala 



TTA GCG TTG 
Leu Ala Leu 
85 

GTC ATA TGC 
Val lie Cys 
100 

GAT TTC CAC 
Asp Phe His 
115 

GGT GGG AGC 
Gly Gly Ser 



ACA TTA AAA 
Thr Leu Lys 



AAA ATC CAT 
Lys lie His 
165 

GGG CTT GCT 
Gly Leu Ala 
180 

ACC ACA TCA 
Thr Thr Ser 
195 

GTT ATT GAT 
Val lie Asp 

ACG ATT GTC 
Thr He Val 



TCT AGT AGT 
Ser Ser Ser 
245 

CAA ACA GCC 
Gin Thr Ala 
260 

TTT AGT GCC 
Phe Ser Ala 
275 

GAA ACC CAA 



GGT GGG ACA 
Gly Gly Thr 
70 

AGA TCG GTG 
Arg Ser Val 



GGA GGT TAT 
Gly Gly Tyr 



TAC ACC GAT 
Tyr Thr Asp 
120 

ACA AAT AGT 
Thr Asn Ser 
135 

GCA GAC AAA 
Ala Asp Lys 
150 

GAA GCT TAT 
Glu Ala Tyr 



CCT TTA AAT 
Pro Leu Asn 



AAA CCA GAA 
Lys Pro Glu 
200 

ACG ACT AAT 
Thr Thr Asn 
215 

AAT ACC CTT 
Asn Thr Leu 
230 

GAA AGT AGT 
Glu Ser Ser 



GGT GGC GGC 
Gly Gly Gly 



GCT TCA GAC 
Ala Ser Asp 
280 

CAA CTC AGC 



ACC 411 
Thr 



TTA 459 
Leu 



ACC 507 

Thr 

105 

GAG 555 
Glu 



AAT 603 
Asn 



AAT 651 
Asn 



CAG 699 
Gin 



AGC 74 7 

Ser 

185 

AAT 7 95 

Asn 



GAT 84 3 

Asp 



AAA 891 
Lys 



GGC 93 9 

Gly 



AAA 987 

Lys 

265 

ATG 1035 
Met 



GCC 1083 
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lie Asn Asn Ala Gin Lys He Val Gin Glu Thr Gin Gin Leu Ser Ala 
285 290 295 

AAC CAA CCA AAA AAT ATC ACA CAA CCC CAT AAT CTC AAC CTT AAC ACC 1131 
Asn Gin Pro Lys Asn He Thr Gin Pro His Asn Leu Asn Leu Asn Thr 
300- 305 310 

CCT AGC AGT CTT ACG GCT TTA GCT CAA AAA ATG CTC AAA AAT GCG CAA 117 9 

Pro Ser Ser Leu Thr Ala Leu Ala Gin Lys Met Leu Lys Asn Ala Gin 
315 320 325 

TCT CAA GCA GAA ATT TTA AAA CTA GCC AAT CAA GTG GAG AGC GAT TTT 122 7 

Ser Gin Ala Glu He Leu Lys Leu Ala Asn Gin Val Glu Ser Asp Phe 
330 335 340 345 

AAC AAA CTT TCT TCA GGC CAT CTT AAA GAC TAC ATA GGG AAA TGC GAT 1275 
Asn Lys Leu Ser Ser Gly His Leu Lys Asp Tyr He Gly Lys Cys Asp 
350 355 360 

GCG AGC GCT ATA AGC AGT GCG AAT ATG ACA ATG CAA AAT CAA AAG AAC 13 2 3 

Ala Ser Ala He Ser Ser Ala Asn Met Thr Met Gin Asn Gin Lys Asn 
365 370 375 

AAT TGG GGG AAC GGG TGT GCT GGC GTG GAA GAA ACT CTG TCT TCA TTA 13 71 

Asn Trp Gly Asn Gly Cys Ala Gly Val Glu Glu Thr Leu Ser Ser Leu 
380 385 390 

AAA ACA AGT GCC GCT GAT TTT AAC AAC CAA ACG CCA CAA ATC AAT CAA 1419 
Lys Thr Ser Ala Ala Asp Phe Asn Asn Gin Thr Pro Gin He Asn Gin 
395 400 405 

GCG CAA AAC CTA GCC AAC ACC CTT ATT CAA GAA CTT GGC AAC AAC CCT 14 6 7 

Ala Gin Asn Leu Ala Asn Thr Leu He Gin Glu Leu Gly Asn Asn Pro 
410 415 420 425 

TTT AGG AAT ATG GGC ATG ATC GCT TCT TCA ACC ACG AAT AAC GGC GCC 1515 
Phe Arg Asn Met Gly Met He Ala Ser Ser Thr Thr Asn Asn Gly Ala 
430 435 440 

TTG AAT GGC CTT GGG GTG CAA GTG GGT TAT AAG CAA TTT TTT GGG GAA 15 6 3 

Leu Asn Gly Leu Gly Val Gin Val Gly Tyr Lys Gin Phe Phe Gly Glu 
445 450 455 

AAG AAA AGA TGG GGG TTA AGG TAT TAT GGT TTC TTT GAT TAC AAC CAC 1611 
Lys Lys Arg Trp Gly Leu Arg Tyr Tyr Gly Phe Phe Asp Tyr Asn His 
460 465 470 

GCC TAT ATC AAA TCC AAT TTC TTT AAC TCG GCT TCT GAT GTG TGG ACT 165 9 

Ala Tyr He Lys Ser Asn Phe Phe Asn Ser Ala Ser Asp Val Trp Thr 
475 480 485 

TAT GGG GTG GGC AGC GAT TTA TTG TTT AAT TTC ATC AAT GAT AAA AAC 17 0 7 

Tyr Gly Val Gly Ser Asp Leu Leu Phe Asn Phe He Asn Asp Lys Asn 
490 495 500 505 

ACC AAC TTT TTA GGC AAG AAT AAC AAG ATT TCA GTG GGA TTT TTT GGA 17 5 5 

Thr Asn Phe Leu Gly Lys Asn Asn Lys He Ser Val Gly Phe Phe Gly 
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510 515 520 

GGT ATC GCC TTA GCA GGG ACT TCA TGG CTT AAT TCT CAA TTC GTG AAT 18 03 

Gly lie Ala Leu Ala Gly Thr Ser Trp Leu Asn Ser Gin Phe Val Asn 
525 530 535 

TTA AAA ACC ATC AGC AAT GTT TAT AGC GCT AAA GTG AAT ACG GCT AAC 1851 
Leu Lys Thr lie Ser Asn Val Tyr Ser Ala Lys Val Asn Thr Ala Asn 
540 545 550 

TTC CAA TTT TTA TTC AAT TTG GGC TTG AGA ACC AAT CTC GCT AGA CCT 18 99 

Phe Gin Phe Leu Phe Asn Leu Gly Leu Arg Thr Asn Leu Ala Arg Pro 
555 560 565 

AAG AAA AAA GAT AGT CAT CAT GCG GCT CAA CAT GGC ATG GAA TTG GGC 194 7 

Lys Lys Lys Asp Ser His His Ala Ala Gin His Gly Met Glu Leu Gly 
570 575 580 585 

GTG AAA ATC CCT ACC ATT AAC ACG AAT TAT TAT TCT TTT CTA GAC ACT 19 95 

Val Lys lie Pro Thr lie Asn Thr Asn Tyr Tyr Ser Phe Leu Asp Thr 
590 595 600 

AAA CTA GAA TAT CGA AGG CTT TAT AGC GTG TAT CTC AAT TAT GTG TTT 2 04 3 

Lys Leu Glu Tyr Arg Arg Leu Tyr Ser Val Tyr Leu Asn Tyr Val Phe 
605 610 615 

GCC TAT TAAAAACCCT CTTTTTAAAA AAGGGGGGGC TTTAAAAAAC CTCTAAAGAT AA 2101 
Ala Tyr 



AAATTTTCAA AAAACAATCA TTAAACCCTA AAAAAGAAAT TTTAAGGTAT AATG CTTTCG 2161 
CCATTTTTAA TTTTCCATGG CAAACTCCTT TTTAGAATTT ATCCCCATAA TCGCTCTTAT 2 221 
GGGGCGTTTG TTTTGCAACA ATCTTTTCGA AACTATCCAA CAAGCTTTA 22 7 0 



(2) INFORMATION FOR SEQ ID NO : 14 : 



(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 640 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 



(ii) MOLECULE TYPE: protein 
(v) FRAGMENT TYPE: internal 
(ix) FEATURE : 



(A) NAME/ KEY : Signal Sequence 

(B) LOCATION: 1 ... 21 
(D) OTHER INFORMATION: 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 14: 

Met Lys Lys Thr Lys Lys Thr lie Leu Leu Ser Leu Thr Leu Ala Ala 

-20 -15 -10 

Ser Leu Leu His Ala Glu Asp Asn Gly Val Phe Leu Ser Val Gly Tyr 
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-5 15 10 

Gin lie Gly Glu Ala Val Gin Lys Val Lys Asn Ala Asp Lys Val Gin 

15 20 25 

Lys Leu Ser Asp Thr Tyr Glu Gin Leu Ser Arg Leu Leu Thr Asn Asp 

30 35 40 

Asn Gly Thr Asn Ser Lys Thr Ser, Ala Gin He Asn Gin Ala Val Asn 

45 50 55 

Asn Leu Asn Glu Arg Ala Lys Thr Leu Ala Gly Gly Thr Thr Asn Ser 
60 65 70 75 

Pro Ala Tyr Gin Ala Thr Leu Leu Ala Leu Arg Ser Val Leu Gly Leu 

80 85 90 

Trp Asn Ser Met Gly Tyr Ala Val He Cys Gly Gly Tyr Thr Lys Ser 

95 100 105 

Pro Gly Glu Asn Asn Gin Lys Asp Phe His Tyr Thr Asp Glu Asn Gly 

110 115 120 

Asn Gly Thr Thr He Asn Cys Gly Gly Ser Thr Asn Ser Asn Gly Thr 

125 130 135 

His Ser Ser Ser Gly Thr Asn Thr Leu Lys Ala Asp Lys Asn Val Ser 
140 145 150 155 

Leu Ser He Glu Gin Tyr Glu Lys He His Glu Ala Tyr Gin He Leu 

160 165 170 

Ser Lys Ala Leu Lys Gin Ala Gly Leu Ala Pro Leu Asn Ser Lys Gly 

175 180 185 

Glu Lys Leu Glu Ala His Val Thr Thr Ser Lys Pro Glu Asn Asn Ser 

190 195 200 

Gin Thr Lys Thr Thr Thr Ser Val He Asp Thr Thr Asn Asp Ala Gin 

205 210 215 

Asn Leu Leu Thr Gin Ala Gin Thr He Val Asn Thr Leu Lys Asp Tyr 
220 225 230 235 

Cys Pro Met Leu He Ala Lys Ser Ser Ser Glu Ser Ser Gly Ala Ala 

240 245 250 

Thr Thr Asn Ala Pro Ser Trp Gin Thr Ala Gly Gly Gly Lys Asn Ser 

255 260 265 

Cys Ala Thr Phe Gly Ala Glu Phe Ser Ala Ala Ser Asp Met He Asn 

270 275 280 

Asn Ala Gin Lys He Val Gin Glu Thr Gin Gin Leu Ser Ala Asn Gin 

285 290 295 

Pro Lys Asn He Thr Gin Pro His Asn Leu Asn Leu Asn Thr Pro Ser 
300 305 310 315 

Ser Leu Thr Ala Leu Ala Gin Lys Met Leu Lys Asn Ala Gin Ser Gin 

320 325 330 

Ala Glu He Leu Lys Leu Ala Asn Gin Val Glu Ser Asp Phe Asn Lys 

335 340 345 

Leu Ser Ser Gly His Leu Lys Asp Tyr He Gly Lys Cys Asp Ala Ser 

350 355 360 

Ala He Ser Ser Ala Asn Met Thr Met Gin Asn Gin Lys Asn Asn Trp 

365 370 375 

Gly Asn Gly Cys Ala Gly Val Glu Glu Thr Leu Ser Ser Leu Lys Thr 
380 385 390 395 

Ser Ala Ala Asp Phe Asn Asn Gin Thr Pro Gin He Asn Gin Ala Gin 

400 405 410 

Asn Leu Ala Asn Thr Leu He Gin Glu Leu Gly Asn Asn Pro Phe Arg 

415 420 425 

Asn Met Gly Met He Ala Ser Ser Thr Thr Asn Asn Gly Ala Leu Asn 

430 < 435 44 0 

Gly Leu Gly Val Gin Val Gly Tyr Lys Gin Phe Phe Gly Glu Lys Lys 
445 450 455 
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Arg Trp Gly Leu Arg Tyr Tyr Gly Phe Phe Asp Tyr Asn His Ala Tyr 
460 465 470 475 

lie Lys Ser Asn Phe Phe Asn Ser Ala Ser Asp Val Trp Thr Tyr Gly 

480 485 490 : 

Val Gly Ser Asp Leu Leu Phe Asn Phe lie Asn Asp Lys Asn Thr Asn 

495 500 505 

Phe Leu Gly Lys Asn Asn Lys lie Ser Val Gly Phe Phe Gly Gly lie 

510 515 520 

Ala Leu Ala Gly Thr Ser Trp Leu Asn Ser Gin Phe Val Asn Leu Lys 

525 530 535 

Thr lie Ser Asn Val Tyr Ser Ala Lys Val Asn Thr Ala Asn Phe Gin 
540 545 550 555 

Phe Leu Phe Asn Leu Gly Leu Arg Thr Asn Leu Ala Arg Pro Lys Lys 

560 565 570 

Lys Asp Ser His His Ala Ala Gin His Gly Met Glu Leu Gly Val Lys 

575 580 585 

lie Pro Thr lie Asn Thr Asn Tyr Tyr Ser Phe Leu Asp Thr Lys Leu 

590 595 600 

Glu Tyr Arg Arg Leu Tyr Ser Val Tyr Leu Asn Tyr Val Phe Ala Tyr 
6 G5 610 615 

(2) INFORMATION FOR SEQ ID NO: 15: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 224 8 base pairs 

(B) TYPE: nucleic acid 

(C) STRAND EDNESS : single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: Genomic DNA 
(ix) FEATURE: 

(A) NAME/ KEY : Coding Sequence 

(B) LOCATION: 173... 2128 
(D) OTHER INFORMATION: 



(A) NAME/KEY: Signal Sequence 

(B) LOCATION: 173 . . . 224 
(D) OTHER INFORMATION: 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 15: 

TGGTTTTATC GTTACAAAAT TCAACATTTC AAAGATAAAT AAGTTAAAAT ACCCCAAAAT 6 0 

CTTTTTTTTT TTTTTGAAAT CCAATCAATT TATAGTAAAA TTAGGTTCAT TGTAAATATA 120 

TTATCACTTC ATGATATTCT TACAACAAAA ACATTACTTT AAGGAACATT TT ATG AAA 17 8 

Met Lys 



AAG ACA ATT CTG CTC TCT CTC TCT GCT TCA TCG CTC TTG CAC GCT GAA 226 
Lys Thr He Leu Leu Ser Leu Ser Ala Ser Ser Leu Leu His Ala Glu 
-15 -10 -5 1 - 

GAC AAC GGC TTT TTT GTG AGC GCC GGC TAT CAA ATC GGC GAA GCG GTG 2 74 
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Asp Asn Gly Phe Phe Val Ser Ala Gly Tyr Gin He Gly Glu Ala Val 
5 10 15 

CAA ATG GTC AAA AAC ACC GGT GAA TTG AAA AAC TTG AAC GAA AAA TAC 3 22 

Gin Met Val Lys Asn Thr Gly Glu Leu Lys Asn Leu Asn Glu Lys Tyr 

20 . _ _ . 25 . 30 

GAG CAA TTA AGC CAG TAT TTA AAT CAA GTG GCT TCG TTG AAG CAA AGC 3 70 

Glu Gin Leu Ser Gin Tyr Leu Asn Gin Val Ala Ser Leu Lys Gin Ser 
35 40 45 

ATT CAA AAC GCC AAC AAC ATT GAG CTG GTC AAT AGC TCT TTA AAC TAT 418 
He Gin Asn Ala Asn Asn He Glu Leu Val Asn Ser Ser Leu Asn Tyr 
50 55 60 65 

TTA AAA AGC TTT ACC AAC AAC AAC TAT AAC AGC ACC ACC CAA TCG CCC 466 
Leu Lys Ser Phe Thr Asn Asn Asn Tyr Asn Ser Thr Thr Gin Ser Pro 
70 75 80 

ATC TTT AAT GCC GTG CAA GCC GTT ATC ACT TCG GTA TTG GGT TTT TGG 514 
He Phe Asn Ala Val Gin Ala Val He Thr Ser Val Leu Gly Phe Trp 
85 90 95 

AGT CTT TAT GCG GGG AAT TAC TTC ACT TTT TTT GTG GGT AAA AAG GTG 562 
Ser Leu Tyr Ala Gly Asn Tyr Phe Thr Phe Phe Val Gly Lys Lys Val 
100 105 HO 

GGT GAT AGT GGG CAA CCC GCT AGT GTC CAG GGT AAC CCT CCT TTT AAA 610 
Gly Asp Ser Gly Gin Pro Ala Ser Val Gin Gly Asn Pro Pro Phe Lys 
115 120 125 

ACG ATT ATA GAG AAC TGC TCA GGA ATT GAA AAC TGC GCT ATG GAT CAA 65 8 

Thr He He Glu Asn Cys Ser Gly He Glu Asn Cys Ala Met Asp Gin 
130 135 140 145 

ACC ACT TAT GAT AAG ATG AAA AAA CTC GCT GAA GAC CTC CAA GCG GCT 706 
Thr Thr Tyr Asp Lys Met Lys Lys Leu Ala Glu Asp Leu Gin Ala Ala 
150 155 160 

CAA ACA AAC TCT GCC ACT AAA GGC AAC AAT CTT TGC GCT TTA TCC GGG 754 
Gin Thr Asn Ser Ala Thr Lys Gly Asn Asn Leu Cys Ala Leu Ser Gly 
165 170 175 

TGT GCT GCA ACA GAC TCA ACA TCA AAC CCA CCA AAC TCA ACC GTG AGC 8 02 

Cys Ala Ala Thr Asp Ser Thr Ser Asn Pro Pro Asn Ser Thr Val Ser 
180 185 190 

AAC GCT CTT AAT TTG GCG CAA CAG CTT ATG GAT TTA ATC GCA AAC ACT 85 0 

Asn Ala Leu Asn Leu Ala Gin Gin Leu Met Asp Leu He Ala Asn Thr 
195 200 205 

AAA ACG GCT ATG ATG TGG AAA AAT ATC GTC ATC AGT GGC GTT TCA AAC 898 
Lys Thr Ala Met Met Trp Lys Asn He Val He Ser Gly Val Ser Asn 
210 215 220 225 

ACA TCC GGT GCT ATC ACA TCC ACT AAT TAC CCA ACG CAA TAC GCG GTG 94 6 

Thr Ser Gly Ala He Thr Ser Thr Asn Tyr Pro Thr Gin Tyr Ala Val 
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TTT AAC AAC 
Phe Asn Asn 



CTT TCT CAA 
Leu Ser Gin 
260 

ACA GGA TCT 
Thr Gly Ser 
275 

GCT CAA AAC 
Ala Gin Asn 
290 

CTC TTT AAT 
Leu Phe Asn 



TAC TTG AAA 
Tyr Leu Lys 



CAA GTG GTG 

Gin Val Val 
340 

AGT TAT TAT 

Ser Tyr Tyr 
355 

GTT TAT AAC 

Val Tyr Asn 
370 

GAC GCT AAG 

Asp Ala Lys 



GTC AAT ACA 
Val Asn Thr 



GCA GCA GGC 
Ala Ala Gly 
420 

CTT AAC CAA 
Leu Asn Gin 
435 

GGC ATG ATC 
Gly Met lie 
450 



230 

ATT AAG GCG 
lie Lys Ala 
245 

AGC AAC CAC 
Ser Asn His 



CAA ACA AAC 
Gin Thr Asn 



CAA AAG CAA 
Gin Lys Gin 
295 

TCT ATC CCT 
Ser lie Pro 
310 

ATA CCC AAT 
lie Pro Asn 
325 

AAT TTA AAC 
Asn Leu Asn 



GGT AAC CGG 
Gly Asn Arg 

CTA AAA TCC 
Leu Lys Ser 
375 

ACT TTG AGC 
Thr Leu Ser 
390 

AAA GAC ATT 
Lys Asp lie 
405 

CAA TCC AAC 
Gin Ser Asn 



GCT TTA GCA 
Ala Leu Ala 



AGC TCT CAA 
Ser Ser Gin 
455 



ATG ATA CCC 
Met lie Pro 
250 

ACC CTA TCT 
Thr Leu Ser 
265 

CCT AAA TTC 
Pro Lys Phe 
280 

GTC ATC TCT 
Val lie Ser 

GCA GAG CAG 
Ala Glu Gin 

GCG GGT TCA 
Ala Gly Ser 
330 

CAA GAA GTT 
Gin Glu Val 
345 

GTG GAT GCG 
Val Asp Ala 
360 

AAT CAA GCA 
Asn Gin Ala 



GAA GAG ATT 
Glu Glu lie 



GTT ACA CTA 
Val Thr Leu 
410 

TAC CAA ATC 
Tyr Gin lie 
425 

GCG ATG AGC 
Ala Met Ser 
440 

AAC AAT AAC 
Asn Asn Asn 



235 

ATT TTG CAA 
lie Leu Gin 



GCT AGC TTG 
Ala Ser Leu 



GCT AAA GAC 
Ala Lys Asp 
285 

TAC GCT CAA 
Tyr Ala Gin 
300 

TAT AAG TAT 
Tyr Lys Tyr 
315 

ACG CCT ACT 
Thr Pro Thr 



CAG ACG ATT 
Gin Thr lie 



GCT TTA AGC 
Ala Leu Ser 
365 

GAA ATC GTA 
Glu lie Val 
380 

TCT AAA CTC 
Ser Lys Leu 
395 

CCT TAC GAT 
Pro Tyr Asp 



AAC CCA GAG 
Asn Pro Glu 



AAT AAC CCC 
Asn Asn Pro 
445 

GGC GCT TTG 
Gly Ala Leu 
460 



240 

CAA GCG GTT 
Gin Ala Val 
255 

CAA GCT CAA 
Gin Ala Gin 
270 

ATC TAC ACT 
lie Tyr Thr 



GAC ATT TTC 
Asp lie Phe 

CTA GAG AAA 
Leu Glu Lys 
320 

AAC CCT TAC 
Asn Pro Tyr 
335 

AAA AAC AAT 
Lys Asn Asn 
350 

GTG GCT AGA 
Val Ala Arg 



ACC GCC TAT 
Thr Ala Tyr 



CCG CAC AAT 
Pro His Asn 
400 

AAA AAC GCC 
Lys Asn Ala 
415 

CAG CAA TCC 
Gin Gin Ser 
430 

TTT AAA AAA 
Phe Lys Lys 

AAC GGG CTT 
Asn Gly Leu 



ACG 994 
Thr 



GCC 1042 
Ala 



TTC 10 90 

Phe 



AAC 1138 

Asn 

305 

GCT 1186 
Ala 



AGA 1234 
Arg 



GTG 1282 
Val 



GAT 13 3 0 

Asp 



AAC 1378 

Asn 

385 

CAA 1426 
Gin 



CCA 1474 
Pro 



AAT 1522 
Asn 



GTG 1570 
Val 



GGC 1618 

Gly 

465 
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GTG CAA GTG GGT TAT AAG CAA TTC TTT GGC GAA AGC AAA AGA TGG GGG 166 6 

Val Gin Val Gly Tyr Lys Gin Phe Phe Gly Glu Ser Lys Arg Trp Gly 
470 475 480 

TTA AGG TAT TAC GGA TTC TTT GAT TAC AAC CAC GGC TAC ATC AAA TCC 1714 
Leu Arg Tyr Tyr Gly Phe Phe Asp Tyr Asn His Gly Tyr lie Lys Ser 
485 490 495 

AGC TTC TTT AAC TCT TCT TCT GAT ATA TGG ACT TAT GGC GGT GGG AGC .1762 
Ser Phe Phe Asn Ser Ser Ser Asp lie Trp Thr Tyr Gly Gly Gly Ser 
500 505 510 

GAT TTG TTA GTG AAT ATT ATC AAC GAT AGC ATC ACA AGA AAG AAC AAC 1810 
Asp Leu Leu Val Asn lie He Asn Asp Ser He Thr Arg Lys Asn Asn 
515 520 525 

AAG CTC TCC GTG GGT CTT TTT GGA GGC ATC CAA CTA GCA GGG ACT ACA 185 8 

Lys Leu Ser Val Gly Leu Phe Gly Gly He Gin Leu Ala Gly Thr Thr 
530 535 540 545 

TGG CTT AAT TCT CAA TAC GTG AAT TTA ACC GCG TTC AAT AAC CCT TAC 190 6 

Trp Leu Asn Ser Gin Tyr Val Asn Leu Thr Ala Phe Asn Asn Pro Tyr 
550 555 560 

AGC GCG AAA GTC AAT GCT ACC AAT TTC CAA TTC TTG TTC AAT CTC GGC 1954 
Ser Ala Lys Val Asn Ala Thr Asn Phe Gin Phe Leu Phe Asn Leu Gly 
565 570 575 

TTG AGG ACG AAT CTC GCT ACA GCT AGG AAA AAA GAC AGC GAA CAT TCC 2 002 

Leu Arg Thr Asn Leu Ala Thr Ala Arg Lys Lys Asp Ser Glu His Ser 
580 585 590 

GCG CAA CAT GGC ATT GAA TTG GGT ATT AAA ATC CCC ACC ATT ACC ACG 2 05 0 

Ala Gin His Gly lie Glu Leu Gly He Lys He Pro Thr He Thr Thr 
595 600 605 

AAT TAC TAT TCT TTT CTA GGC ACT CAA TTG CAA TAC AGA AGG CTC TAT 2 09 8 

Asn Tyr Tyr Ser Phe Leu Gly Thr Gin Leu Gin Tyr Arg Arg Leu Tyr 
610 615 620 625 

AGC GTG TAT CTC AAT TAT GTG TTC GCT TAC TGAGTGATTC AAGCTCTCTT CTT 2151 
Ser Val Tyr Leu Asn Tyr Val Phe Ala Tyr 
630 635 

TAAGGGGGTT TAGAAAAATC GCAACGCCAA GCTTTTTATC GTTGGTGATA AAATCTACAA 2 211 
AACTAACGGC G C GAC AAC AA ACCCTAACGC TACGCTC 224 8 

(2) INFORMATION FOR SEQ ID NO: 16: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 652 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY : linear 

(ii) MOLECULE TYPE: protein 
(v) FRAGMENT TYPE: internal 
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(ix) FEATURE: 

(A) NAME/ KEY : Signal Sequence 

(B) LOCATION: 1 ... 17 
(D) OTHER INFORMATION : 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO : 16 : 

Met Lys Lys Thr lie Leu Leu Ser Leu Ser Ala Ser Ser Leu Leu His 

-15 -10 -5 

Ala Glu Asp Asn Gly Phe Phe Val Ser Ala Gly Tyr Gin lie Gly Glu 

15 10 15 

Ala Val Gin Met Val Lys Asn Thr Gly Glu Leu Lys Asn Leu Asn Glu 

20 25 30 

Lys Tyr Glu Gin Leu Ser Gin Tyr Leu Asn Gin Val Ala Ser Leu Lys 

35 40 45 

Gin Ser lie Gin Asn Ala Asn Asn lie Glu Leu Val Asn Ser Ser Leu 

5 0 5 5 6 0 

Asn Tyr Leu Lys Ser Phe Thr Asn Asn Asn Tyr Asn Ser Thr Thr Gin 

65 70 75 

Ser Pro lie Phe Asn Ala Val Gin Ala Val lie Thr Ser Val Leu Gly 
80 85 90 95 

Phe Trp Ser Leu Tyr Ala Gly Asn Tyr Phe Thr Phe Phe Val Gly Lys 

100 105 110 

Lys Val Gly Asp Ser Gly Gin Pro Ala Ser Val Gin Gly Asn Pro Pro 

115 120 125 

Phe Lys Thr lie lie Glu Asn Cys Ser Gly lie Glu Asn Cys Ala Met 

130 135 140 

Asp Gin Thr Thr Tyr Asp Lys Met Lys Lys Leu Ala Glu Asp Leu Gin 

145 150 155 

Ala Ala Gin Thr Asn Ser Ala Thr Lys Gly Asn Asn Leu Cys Ala Leu 
160 165 170 175 

Ser Gly Cys Ala Ala Thr Asp Ser Thr Ser Asn Pro Pro Asn Ser Thr 

180 185 190 

Val Ser Asn Ala Leu Asn Leu Ala Gin Gin Leu Met Asp Leu lie Ala 

195 200 205 

Asn Thr Lys Thr Ala Met Met Trp Lys Asn lie Val lie Ser Gly Val 

210 215 220 

Ser Asn Thr Ser Gly Ala lie Thr Ser Thr Asn Tyr Pro Thr Gin Tyr 

225 230 235 

Ala Val Phe Asn Asn lie Lys Ala Met lie Pro lie Leu Gin Gin Ala 
240 245 250 255 

Val Thr Leu Ser Gin Ser Asn His Thr Leu Ser Ala Ser Leu Gin Ala 

260 265 270 

Gin Ala Thr Gly Ser Gin Thr Asn Pro Lys Phe Ala Lys Asp lie Tyr 

275 280 285 

Thr Phe Ala Gin Asn Gin Lys Gin Val lie Ser Tyr Ala Gin Asp lie 

290 295 300 

Phe Asn Leu Phe Asn Ser lie Pro Ala Glu Gin Tyr Lys Tyr Leu Glu 

305 310 315 

Lys Ala Tyr Leu Lys lie Pro Asn Ala Gly Ser Thr Pro Thr Asn Pro 
320 325 330 335 

Tyr Arg Gin Val Val Asn Leu Asn Gin Glu Val Gin Thr lie Lys Asn 

340 345 350 

Asn Val Ser Tyr Tyr Gly Asn Arg Val Asp Ala Ala Leu Ser Val Ala 
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355 360 365 

Arg Asp Val Tyr Asn Leu Lys Ser Asn Gin Ala Glu lie Val Thr Ala 

370 375 380 

Tyr Asn Asp Ala Lys Thr Leu Ser Glu Glu He Ser Lys Leu Pro His 

385 390 395 

Asn Gin Val Asn Thr Lys Asp He Val Thr Leu Pro Tyr Asp Lys - Asn 
400 405 410 415 

Ala Pro Ala Ala Gly Gin Ser Asn Tyr Gin He Asn Pro Glu Gin Gin 

420 425 430 

Ser Asn Leu Asn Gin Ala Leu Ala Ala Met Ser Asn Asn Pro Phe Lys 

435 440 445 

Lys Val Gly Met He Ser Ser Gin Asn Asn Asn Gly Ala Leu Asn Gly 

450 455 460 

Leu Gly Val Gin Val Gly Tyr Lys Gin Phe Phe Gly Glu Ser Lys Arg 

465 470 475 

Trp Gly Leu Arg Tyr Tyr Gly Phe Phe Asp Tyr Asn His Gly Tyr He 
480 485 490 495 

Lys Ser Ser Phe Phe Asn Ser Ser Ser Asp He Trp Thr Tyr Gly Gly 

500 505 510 

Gly Ser Asp Leu Leu Val Asn He He Asn Asp Ser He Thr Arg Lys 

515 520 525 

Asn Asn Lys Leu Ser Val Gly Leu Phe Gly Gly He Gin Leu Ala Gly 

530 535 540 

Thr Thr Trp Leu Asn Ser Gin Tyr Val Asn Leu Thr Ala Phe Asn Asn 

545 550 555 

Pro Tyr Ser Ala Lys Val Asn Ala Thr Asn Phe Gin Phe Leu Phe Asn 
560 565 570 575 

Leu Gly Leu Arg Thr Asn Leu Ala Thr Ala Arg Lys Lys Asp Ser Glu 

580 585 590 

His Ser Ala Gin His Gly He Glu Leu Gly He Lys lie Pro Thr He 

595 600 605 

Thr Thr Asn Tyr Tyr Ser Phe Leu Gly Thr Gin Leu Gin Tyr Arg Arg 

610 615 620 

Leu Tyr Ser Val Tyr Leu Asn Tyr Val Phe Ala Tyr 
625 630 635 

(2) INFORMATION FOR SEQ ID NO: 17: 

( i ) SEQUENCE CHARACTERISTICS : 

(A) LENGTH: 2161 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: Genomic DNA 
(ix) FEATURE: 

(A) NAME / KEY : Coding Sequence 

(B) LOCATION: 122... 2056 
(D) OTHER INFORMATION: 



(A) NAME /KEY : Signal Sequence 

(B) LOCATION: 122 ... 179 
(D) OTHER INFORMATION: 
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(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 17: 

CAAAAATCTT TTTTTTTTTT TTTTGAAATC CAATAAATTT ATGGTAAAGT TAAACATATT 6 0 

GTAAATAAAT TTTAATTTCT ATTCATGTTT ACAATAAAAA AATTACTTTA AGGAACATTT 12 0 

T ATG AAA AAG ACA ATT CTA CTC TCT CTC TCT CTC TCG CTT TCA TCG CTC 169 
Met Lys Lys Thr He Leu Leu Ser Leu Ser Leu Ser Leu Ser Ser Leu 
-15 -10 -5 

TTG CAC GCT GAA GAC AAC GGC TTT TTT GTG AGC GCC GGC TAT CAA ATC 217 
Leu His Ala Glu Asp Asn Gly Phe Phe Val Ser Ala Gly Tyr Gin He 
15 10 

GGC GAA CGG GTG CAA ATG GTC AAA AAC ACC GGC GAA TTG AAA AAC TTG 2 65 

Gly Glu Arg Val Gin Met Val Lys Asn Thr Gly Glu Leu Lys Asn Leu 
15 20 25 

AAC GAA AAA TAC GAG CAA TTA AGC CAA TCT TTA GCC CAA CTG GCT TCG 313 
Asn Glu Lys Tyr G-lu Gin Leu Ser Gin Ser Leu Ala Gin Leu Ala Ser 
30 35 40 45 

TTA AAA AAA AGC ATT CAA ACG GCG AAC AAC ATT CAG GCT GTC AAC AAT 361 
Leu Lys Lys Ser He Gin Thr Ala Asn Asn He Gin Ala Val Asn Asn 
50 55 60 

GCT TTA AGC GAT TTA AAA AGC TTT GCG AGT AAC AAC CAC ACA AAC AAA 40 9 

Ala Leu Ser Asp Leu Lys Ser Phe Ala Ser Asn Asn His Thr Asn Lys 
65 70 75 

GAA ACA TCG CCC ATC TAC AAC ACC GCG CAA GCT GTT ATC ACT TCA GTA 4 57 

Glu Thr Ser Pro lie Tyr Asn Thr Ala Gin Ala Val He Thr Ser Val 
80 85 90 

TTG GCT TTT TGG AGT CTT TAT GCA GGG AAC GCT ACC AGT TTT CAT GTG 50 5 

Leu Ala Phe Trp Ser Leu Tyr Ala Gly Asn Ala Thr Ser Phe His Val 
95 100 105 

ACC GGT TTG AAT GAT GGA TCT AAT GCT CCT CTT GGA AGA ATC CAT CAA 55 3 

Thr Gly Leu Asn Asp Gly Ser Asn Ala Pro Leu Gly Arg He His Gin 
110 115 120 125 

GAT GGG AAC TGC ACA GGA TTA CAA CAA TGT TTT ATG AAT AAA GAA ACT 601 
Asp Gly Asn Cys Thr Gly Leu Gin Gin Cys Phe Met Asn Lys Glu Thr 
130 135 140 

TAT GAT AAA ATG AAA GCG CTT GCC GAA AAT CTC CAA AAA GCT CAA GGC 64 9 

Tyr Asp Lys Met Lys Ala Leu Ala Glu Asn Leu Gin Lys Ala Gin Gly 
145 150 155 

AAT CTC TGT GCC TTA TCA GAA TGC CCT AGC GAT CAA TTA AAT GGA AAC 697 
Asn Leu Cys Ala Leu Ser Glu Cys Pro Ser Asp Gin Leu Asn Gly Asn 
160 165 170 

AAT GGA AAC AAA ACT TCC ATG ACT AAA GCT CTT GAA ACC GCG CAA CAG. 74 5 

Asn Gly Asn Lys Thr Ser Met Thr Lys Ala Leu Glu Thr Ala Gin Gin 
175 180 185 



SUBSTITUTE SHEET (RULE 26) 

BNSDOCID: <WO 9843479A1_I_> 



WO 98/43479 



124 PCT/US98/06421 



CTT ATG GAT TTA ATC GCA AAC ACT AAA ACG GCT ATG ATG TGG AAA AAT 7 93 

Leu Met Asp Leu lie Ala Asn Thr Lys Thr Ala Met Met Trp Lys Asn 
190 195 200 205 

ATC GTC ATC GCA GGT GTT ACA AAC AGA CCC GGT GGT GCT GGC GCT ATC 841 
lie Val lie Ala Gly Val Thr Asn Arg Pro Gly Gly Ala Gly Ala lie 
210 215 220 

ACA TCC ACT GGT CCT GTA ACC GAC TAT GCG GTG TTT AAC AAC ATT AAG 8 89 

Thr Ser Thr Gly Pro Val Thr Asp Tyr Ala Val Phe Asn Asn lie Lys 
225 230 235 

GCG ATG ATA CCC ATT TTG CAA CAA GCG GTT ACG CTT TCT CAA AGC AAC 93 7 

Ala Met lie Pro lie Leu Gin Gin Ala Val Thr Leu Ser Gin Ser Asn 
240 245 250 

CAC ACC CTA TCT GCT AGC TTG CAA GCT CAA GCC ACA GGA TCT CAA ACA 98 5 

His Thr Leu Ser Ala Ser Leu Gin Ala Gin Ala Thr Gly Ser Gin Thr 
255 260 265 

AAC CCT AAA TTC GCT AAA GAC ATC TAC ACT TTC GCT CAA AAC CAA AAG 103 3 

Asn Pro Lys Phe Ala Lys Asp He Tyr Thr Phe Ala Gin Asn Gin Lys 
270 275 280 285 

CAA GTC ATC TCT TAC GCT CAA GAC ATT TTC AAC CTC TTT AAT TCT ATC 10 81 

Gin Val He Ser Tyr Ala Gin Asp He Phe Asn Leu Phe Asn Ser He 
290 295 300 

CCT GCA GAG CAG TAT AAG TAT CTA GAG AAA GCT TAC TTG AAA ATA CCC 112 9 

Pro Ala Glu Gin Tyr Lys Tyr Leu Glu Lys Ala Tyr Leu Lys He Pro 
305 310 315 

AAT GCG GGT TCA ACG CCT ACT AAC CCT TAC AGA CAA GTG GTG AAT TTA 117 7 

Asn Ala Gly Ser Thr Pro Thr Asn Pro Tyr Arg Gin Val Val Asn Leu 
320 .325 330 

AAC CAA GAA GTT CAG ACG ATT AAA AAC AAT GTG AGT TAT TAT GGT AAC 12 2 5 

Asn Gin Glu Val Gin Thr He Lys Asn Asn Val Ser Tyr Tyr Gly Asn 
335 340 345 

CGG GTG GAT GCG GCT TTA AGC GTG GCT AGA GAT GTT TAT AAC CTA AAA 12 7 3 

Arg Val Asp Ala Ala Leu Ser Val Ala Arg Asp Val Tyr Asn Leu Lys 
350 355 360 365 

TCC AAT CAA GCA GAA ATC GTA ACC GCC TAT AAC GAC GCT AAG ACT TTG 13 21 

Ser Asn Gin Ala Glu He Val Thr Ala Tyr Asn Asp Ala Lys Thr Leu 
370 375 380 

AGC GAA GAG ATT TCT AAA CTC CCG CAC AAT CAA GTC AAT ACA AAA GAC_ 13 6 9 
Ser Glu Glu He Ser Lys Leu Pro His Asn Gin Val Asn Thr Lys Asp 
385 390 395 

ATT GTT ACA CTA CCT TAC GAT AAA AAC GCC CCA GCA GCA GGC CAA TCC 1417 
He Val Thr Leu Pro Tyr Asp Lys Asn Ala Pro Ala Ala Gly Gin Ser 
400 405 410 

AAC TAC CAA ATC AAC CCA GAG CAG CAA TCC AAT CTT AAC CAA GCT TTA 14 6 5 
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Asn Tyr Gin lie Asn Pro Glu Gin Gin Ser Asn Leu Asn Gin Ala Leu 
415 420 425 

GCA GCG ATG AGC AAT AAC CCC TTT AAA AAA GTG GGC ATG ATC AGC TCT 1513 
Ala Ala Met Ser Asn Asn Pro Phe Lys Lys Val Gly Met lie Ser Ser 
430 435 440 445 

CAA AAC AAT AAC GGC GCT TTG AAC GGG CTT GGC GTG CAA GTG GGT TAT 15 61 

Gin Asn Asn Asn Gly Ala Leu Asn Gly Leu Gly Val Gin Val Gly Tyr 
450 455 460 

AAG CAA TTC TTT GGC GAA AGC AAA AGA TGG GGG TTA AGG TAT TAC GGA 1609 
Lys Gin Phe Phe Gly Glu Ser Lys Arg Trp Gly Leu Arg Tyr Tyr Gly 
465 470 475 

TTC TTT GAT TAC AAC CAC GGC TAC ATC AAA TCC AGC TTC TTT AAC TCT 165 7 

Phe Phe Asp Tyr Asn His Gly Tyr lie Lys Ser Ser Phe Phe Asn Ser 
480 485 490 

TCT TCT GAT ATA TGG ACT TAT GGC GGT GGG AGC GAT TTG TTA GTG AAT 17 05 

Ser Ser Asp lie Trp Thr Tyr Gly Gly Gly Ser Asp Leu Leu Val Asn 
495 500 505 

ATT ATC AAC GAT AGC ATC ACA AGA AAG AAC AAC AAG CTC TCC GTG GGT 17 53 

lie lie Asn Asp Ser lie Thr Arg Lys Asn Asn Lys Leu Ser Val Gly 
510 515 520 525 

CTT TTT GGA GGC ATC CAA CTA GCA GGG ACT ACA TGG CTT AAT TCT CAA 1801 
Leu Phe Gly Gly lie Gin Leu Ala Gly Thr Thr Trp Leu Asn Ser Gin 
530 535 540 

TAC GTG AAT TTA ACC GCG TTC AAT AAC CCT TAC AGC GCG AAA GTC AAT 184 9 

Tyr Val Asn Leu Thr Ala Phe Asn Asn Pro Tyr Ser Ala Lys Val Asn 
545 550 555 

GCT ACC AAT TTC CAA TTC TTG TTC AAT CTC GGC TTG AGG ACG AAT CTC 18 97 

Ala Thr Asn Phe Gin Phe Leu Phe Asn Leu Gly Leu Arg Thr Asn Leu 
560 565 570 

GCT ACA GCT AGG AAA AAA GAC AGC GAA CAT TCC GCG CAA CAT GGC ATT 194 5 

Ala Thr Ala Arg Lys Lys Asp Ser Glu His Ser Ala Gin His Gly lie 
575 580 585 

GAA TTG GGT ATT AAA ATC CCC ACC ATT ACC ACG AAT TAC TAT TCT TTT 1993 
Glu Leu Gly lie Lys lie Pro Thr lie Thr Thr Asn Tyr Tyr Ser Phe 
590 595 600 605 

CTA GGC ACT CAA TTG CAA TAC AGA AGG CTC TAT AGC GTG TAT CTC AAT 2 041 

Leu Gly Thr Gin Leu Gin Tyr Arg Arg Leu Tyr Ser Val Tyr Leu Asn 
610 615 620 

TAT GTG TTC GCT TAT TAAAAAATCT TCTTTTTAAA ATAGGGGGAG CTTCATCAAA T 2 097 
Tyr Val Phe Ala Tyr 
625 

CTATTTTGAT AGTTATCAAT ATTTGATGAA AATAAAGTCA AAAACAAAAT AAACCAAATC 2157 
ACCC 2161 
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(2) INFORMATION FOR SEQ ID NO: 18: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 64 5 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS : single - 

( D ) TOPOLOGY : 1 inear 

(ii) MOLECULE TYPE: protein 
(v) FRAGMENT TYPE: internal 
(ix) FEATURE: 

(A) NAME / KEY : Signal Sequence 

(B) LOCATION: 1 ... 19 
(D) OTHER INFORMATION: 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO:18: 

Met Lys Lys Thr lie Leu Leu Ser Leu Ser Leu Ser Leu Ser Ser Leu 

-15 -10 -5 

Leu His Ala Glu Asp Asn Gly Phe Phe Val Ser Ala Gly Tyr Gin lie 

15 10 
Gly Glu Arg Val Gin Met Val Lys Asn Thr Gly Glu Leu Lys Asn Leu 

15 20 25 

Asn Glu Lys Tyr Glu Gin Leu Ser Gin Ser Leu Ala Gin Leu Ala Ser 
30 35 40 45 

Leu Lys Lys Ser lie Gin Thr Ala Asn Asn lie Gin Ala Val Asn Asn 

50 55 60 

Ala Leu Ser Asp Leu Lys Ser Phe Ala Ser Asn Asn His Thr Asn Lys 

65 70 75 

Glu Thr Ser Pro lie Tyr Asn Thr Ala Gin Ala Val lie Thr Ser Val 

80 85 90 

Leu Ala Phe Trp Ser Leu Tyr Ala Gly Asn Ala Thr Ser Phe His Val 

95 100 105 

Thr Gly Leu Asn Asp Gly Ser Asn Ala Pro Leu Gly Arg lie His Gin 
110 115 120 125 

Asp Gly Asn Cys Thr Gly Leu Gin Gin Cys Phe Met Asn Lys Glu Thr 

130 135 140 

Tyr Asp Lys Met Lys Ala Leu Ala Glu Asn Leu Gin Lys Ala Gin Gly 

145 150 155 

Asn Leu Cys Ala Leu Ser Glu Cys Pro Ser Asp Gin Leu Asn Gly Asn 

160 165 170 

Asn Gly Asn Lys Thr Ser Met Thr Lys Ala Leu Glu Thr Ala Gin Gin 

175 180 185 

Leu Met Asp Leu lie Ala Asn Thr Lys Thr Ala Met Met Trp Lys Asn 
190 195 200 205 

lie Val lie Ala Gly Val Thr Asn Arg Pro Gly Gly Ala Gly Ala lie 

210 215 220 

Thr Ser Thr Gly Pro Val Thr Asp Tyr Ala Val Phe Asn Asn lie Lys 

225 230 235 

Ala Met He Pro He Leu Gin Gin Ala Val Thr Leu Ser Gin Ser Asn 

240 245 250 

His Thr Leu Ser Ala Ser Leu Gin Ala Gin Ala Thr Gly Ser Gin Thr 

255 260 265 

Asn Pro Lys Phe Ala Lys Asp He Tyr Thr Phe Ala Gin Asn Gin Lys 
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270 



275 



280 



285 



Gin Val lie Ser Tyr Ala Gin Asp lie Phe Asn Leu Phe Asn Ser lie 

290 295 300 

Pro Ala Glu Gin Tyr Lys Tyr Leu Glu Lys Ala Tyr Leu Lys lie Pro 

305 310 315 

Asn Ala Gly Ser Thr Pro Thr Asn Pro Tyr Arg Gin Val Val Asn Leu 

320 325 330 

Asn Gin Glu Val Gin Thr lie Lys Asn Asn Val Ser Tyr Tyr Gly Asn 

335 340 345 

Arg Val Asp Ala Ala Leu Ser Val Ala Arg Asp Val Tyr Asn Leu Lys 
350 355 360 365 

Ser Asn Gin Ala Glu lie Val Thr Ala Tyr Asn Asp Ala Lys Thr Leu 

370 375 380 

Ser Glu Glu lie Ser Lys Leu Pro His Asn Gin Val Asn Thr Lys Asp 

385 390 395 

lie Val Thr Leu Pro Tyr Asp Lys Asn Ala Pro Ala Ala Gly Gin Ser 

400 405 410 

Asn Tyr Gin lie Asn Pro Glu Gin Gin Ser Asn Leu Asn Gin Ala Leu 

415 420 425 

Ala Ala Met Sex Asn Asn Pro Phe Lys Lys Val Gly Met lie Ser Ser 
430 435 440 445 

Gin Asn Asn Asn Gly Ala Leu Asn Gly Leu Gly Val Gin Val Gly Tyr 

450 455 460 

Lys Gin Phe Phe Gly Glu Ser Lys Arg Trp Gly Leu Arg Tyr Tyr Gly 

465 470 475 

Phe Phe Asp Tyr Asn His Gly Tyr lie Lys Ser Ser Phe Phe Asn Ser 

480 485 490 

Ser Ser Asp lie Trp Thr Tyr Gly Gly Gly Ser Asp Leu Leu Val Asn 

495 500 505 

lie lie Asn Asp Ser lie Thr Arg Lys Asn Asn Lys Leu Ser Val Gly 
510 515 520 525 

Leu Phe Gly Gly He Gin Leu Ala Gly Thr Thr Trp Leu Asn Ser Gin 

530 535 540 

Tyr Val Asn Leu Thr Ala Phe Asn Asn Pro Tyr Ser Ala Lys Val Asn 

545 550 555 

Ala Thr Asn Phe Gin Phe Leu Phe Asn Leu Gly Leu Arg Thr Asn Leu 

560 565 570 

Ala Thr Ala Arg Lys Lys Asp Ser Glu His Ser Ala Gin His Gly He 

575 580 585 

Glu Leu Gly He Lys He Pro Thr lie Thr Thr Asn Tyr Tyr Ser Phe 
590 595 600 605 

Leu Gly Thr Gin Leu Gin Tyr Arg Arg Leu Tyr Ser Val Tyr Leu Asn 

610 615 620 

Tyr Val Phe Ala Tyr 



(2) INFORMATION FOR SEQ ID NO: 19: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 1799 base pairs 

(B) TYPE: nucleic acid 

(C) STRAND EDNESS : single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: Genomic DNA 
(ix) FEATURE: 



625 
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(A) NAME /KEY : Coding Sequence 

(B) LOCATION: 185 . . . 1633 
(D) OTHER INFORMATION: 



(A) NAME/ KEY : Signal Sequence 

(B) LOCATION: 185. . .233 
(D) OTHER INFORMATION: 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO : 19 : 

TACTCAAAAC ATTTTTCACT ATCAAAAACC TTTTTTTTAA ATCCAAAAAA AAAGCAAAAT 60 

TTCTTAATTT TTGCTCAATT TTATTAAAAA TTCAATAAAT TTATGG C AC A ATTTAAACTT 12 0 

ATTGTAAATA AAGTTTCAAT TTGATACGAT TTTACAAACA AAACATTACT TTAAGGAACA 180 

TTTT ATG AAA AAA ACG ATT TTA CTT TCT CTT ATG GTT TCA TCG CTC CTC 22 9 
Met Lys Lys Thr He Leu Leu Ser Leu Met Val Ser Ser Leu Leu 
-15 -10 -5 

GCT GAA AAT GAC GGC GTT TTT ATG AGC GTG GGC TAT CAA ATC GGC, GAA 2 77 

Ala Glu Asn Asp Gly Val Phe Met Ser Val Gly Tyr Gin He Gly Glu 
15 10 15 

GCG GTT CAA CAA GTG AAA AAC ACC GGC GAA ATC CAA AAA GTC TCC AAC 325 
Ala Val Gin Gin Val Lys Asn Thr Gly Glu He Gin Lys Val Ser Asn 
20 25 30 

GCT TAC GAA AAT TTG AAC AAT CTT TTA ACC CGC TAT AAC GAA CTC AAA 3 73 

Ala Tyr Glu Asn Leu Asn Asn Leu Leu Thr Arg Tyr Asn Glu Leu Lys 
35 40 45 

CAA ACG GCC TCT AAC ACC AAT TCA AGT ACC GCT CAA GCG ATT GAT AAT ' 421 
Gin Thr Ala Ser Asn Thr Asn Ser Ser Thr Ala Gin Ala He Asp Asn 
50 55 60 

CTA AAA GAG AGC GCT AGC CGA TTG AAA ACG ACC CCC AAT AGC GCT AAT 46 9 

Leu Lys Glu Ser Ala Ser Arg Leu Lys Thr Thr Pro Asn Ser Ala Asn 
65 70 75 

CAA GCC GTG TCT TCA GCG CTC AGC TCT GCG GTA GCC ATG TGG CAA GTA 517 
Gin Ala Val Ser Ser Ala Leu Ser Ser Ala Val Ala Met Trp Gin Val 
80 85 90 95 

ATA GTC TCT AAT TTA GCC AAT AAC TCG CTA CCC ACT AGT GAA TAC AAC 565 
He Val Ser Asn Leu Ala Asn Asn Ser Leu Pro Thr Ser Glu Tyr Asn 
100 105 HO 

AAA ATC AAT GCG ATT TCT CAA TCG CTC CAA AAC ACC CTA GAA AAT AAA 613 
Lys He Asn Ala He Ser Gin Ser Leu Gin Asn Thr Leu Glu Asn Lys 
115 120 125 

AAC AAT GAT CTT AAA ATT GAA AAT GAC TAC GAC CAT CTT TTA ACT CAA 661 
Asn Asn Asp Leu Lys He Glu Asn Asp Tyr Asp His Leu Leu Thr Gin 
130 135 140 
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GCT AGC ACC 
Ala Ser Thr 
145 

GGA GGC AAT 
Gly Gly Asn 
160 

AAT ATT TTT 
Asn lie Phe 



GCT AAA AAA 
Ala Lys Lys 

AAC CAA CCA 
Asn Gin Pro 
210 

CAA GTC TCA 
Gin Val Ser 
225 

AAT TTA GCA 
Asn Leu Ala 
240 

GGG TTT CAA 
Gly Phe Gin 



GAA ACC CAA 
Glu Thr Gin 



AAC CCT TTT 

Asn Pro Phe 
290 

GCG ATG AAT 

Ala Met Asn 
305 

AAA AAT AAA 

Lys Asn Lys 
320 

CAT GCC TAT 

His Ala Tyr 



ACT TAT GGC 
Thr Tyr Gly 



TCC GAT AAA 



ATT ATT AAT 
He He Asn 



GGC AAA CCA 
Gly Lys Pro 
165 

GGC AAC ACC 
Gly Asn Thr 
180 

GCC GCC GCA 
Ala Ala Ala 
195 

AGT GCG TTT 
Ser Ala Phe 



AGC GTT ATT 
Ser Val He 



ACC ATC TAC 
Thr lie Tyr 
245 

AGT TTG GTG 
Ser Leu Val 
260 

TAT TCT GAA 
Tyr Ser Glu 
275 

AG A AGC GTG 
Arg Ser Val 



GGC GTG GGC 
Gly Val Gly 



TTT TTT GGG 
Phe Phe Gly 
325 

ATC AAA TCC 
He Lys Ser 
340 

GCA GGC AGT 
Ala Gly Ser 
355 

AAC CGC AAA 



ACC CTT CAA 
Thr Leu Gin 
150 

TGG GGC ATT 
Trp Gly He 



TTT AAC GCC 
Phe Asn Ala 



GAT GCC CGA 
Asp Ala Arg 
200 

AAC AAC GCT 
Asn Asn Ala 
215 

AAT GAC ACG 
Asn Asp Thr 
230 

AAC ACC CTT 
Asn Thr Leu 



AGC CGA TCT 
Ser Arg Ser 



TTC CAA ACT 
Phe Gin Thr 
280 

GGT TTA ATC 

Gly Leu lie 
295 

GTG CAA TTA 
Val Gin Leu 
310 

ATC CGT TAT 
He Arg Tyr 



AAC TTT TTC 
Asn Phe Phe 



GAT CTT TTA 
Asp Leu Leu 
360 

GTC TCT TTT 



AGC CAA TGC 
Ser Gin Cys 
155 

AAT GCA AGC 
Asn Ala Ser 
170 

ATC ACT AGC 
He Thr Ser 
185 

AGA ACT GCC 
Arg Thr Ala 

GAT TTC AAT 
Asp Phe Asn 



ATC TCT TAC 
He Ser Tyr 
235 

CAA AAA ACG 
Gin Lys Thr 
250 

AGC TAT AGT 
Ser Tyr Ser 
265 

ACC ACC AAA 
Thr Thr Lys 



AAC TCT CAA 
Asn Ser Gin 



GGC TAT AAG 
Gly Tyr Lys 
315 

TAT GCC TTT 
Tyr Ala Phe 
330 

AAC TCC GCT 
Asn Ser Ala 
345 

TTG AAT TTC 
Leu Asn Phe 

GGC ATT TTT 



CCA GGC ATA 
Pro Gly He 



GGG AAC GCA 
Gly Asn Ala 



ATG ATA GAT 
Met He Asp 
190 

CCA GAA AGT 
Pro Glu Ser 
205 

AAA AAC CTT 
Lys Asn Leu 
220 

CTC AAA GGG 
Leu Lys Gly 



CCC GAT TCT 
Pro Asp Ser 



TAT TCC CTC 
Tyr Ser Leu 
270 

GAG TTT GGC 
Glu Phe Gly 
285 

AGC AAT AAC 
Ser Asn Asn 
300 

CAA TTC TTT 
Gin Phe Phe 



TTT GAT TAC 
Phe Asp Tyr 



TCC AAT GTT 
Ser Asn Val 
350 

ATC AAT GGC 
He Asn Gly 
365 

GGA GGC ATC 



GAC 7 0 9 

Asp 



TGC 757 

Cys 

175 

AGC 8 0 5 

Ser 



CCA 8 5 3 

Pro 



AAT 901 
Asn 



GAC 94 9 

Asp 



AAA 997 

Lys 

255 

AAC 1045 
Asn 



CAT 1093 
His 



GGA 114 1 

Gly 



GGG 1189 
Gly 



AAC 12 3 7 

Asn 

335 

TTC 1285 
Phe 

GGA 13 3 3 

GCT 1381 
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Ser Asp Lys Asn Arg Lys Val Ser Phe Gly lie Phe Gly Gly He Ala 
370 375 380 

CTA GCA GGC ACG ACA TGG CTT AAT TCC CAA TTT ATG AAT TTA AAA ACC 14 2 9 

Leu Ala Gly Thr Thr Trp Leu Asn Ser Gin Phe Met Asn Leu Lys Thr 
385 390 395 

ACC AAT AGC GCC TAC AGC GCT AAG ATC AAC AAC ACC AAT TTC CAA TTC 14 7 7 

Thr Asn Ser Ala Tyr Ser Ala Lys He Asn Asn Thr Asn Phe Gin Phe 
400 405 410 415 

TTA TTC AAT ACT GGT TTA AGG CTT CAA GGG ATT CAC CAT GGC GTT GAA 152 5 

Leu Phe Asn Thr Gly Leu Arg Leu Gin Gly He His His Gly Val Glu 
420 425 430 

TTA GGC GTG AAA ATC CCC ACC ATC AAC ACG AAT TAC TAT TCT TTC • ATG 1573 
Leu Gly Val Lys He Pro Thr He Asn Thr Asn Tyr Tyr Ser Phe Met 
435 440 445 

GGC GCT AAA TTA GCA TAC CGA AGA CTT TAT AGC GTG TAT TTC AAT TAT 1621 
Gly Ala Lys Leu Ala Tyr Arg Arg Leu Tyr Ser Val Tyr Phe Asn Tyr 
450 455 460 

GTT TTG GCC TAT TGATATTGAA TCGGTTCTCA TTACTAATGA GGACAAAGCC AAACT 16 7 8 
Val Leu Ala Tyr 
465 

TTTTGGCTCT CAATGAATAA CGGCATCATT TTACTTGACT TTTTACAAAA AACACACTAA 17 3 8 
AATTTCTTTT TCTTTTTTGA GCGAAATTCC AGATTAGCTC AG CG GTAGAG TAGGCGGCTG 17 98 
T 1799 

(2) INFORMATION FOR SEQ ID NO: 20: 

( i ) SEQUENCE CHARACTERISTICS : 

(A) LENGTH: 483 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY : linear 

(ii) MOLECULE TYPE: protein 
(v) FRAGMENT TYPE: internal 
(ix) FEATURE: 

(A) NAME / KEY : Signal Sequence 

(B) LOCATION: 1 ... 16 
(D) OTHER INFORMATION: 



(xi) SEQUENCE DESCRIPTION: 

Met Lys Lys Thr He Leu Leu Ser 

-15 -10 
Glu Asn Asp Gly Val Phe Met Ser 

1 5 
Val Gin Gin Val Lys Asn Thr Gly 
20 



SEQ ID NO: 20 : 

Leu Met Val Ser Ser Leu Leu Ala 
-5 

Val Gly Tyr Gin He Gly Glu Ala 

10 15 
Glu He Gin Lys Val Ser Asn Ala 
25 30 
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Tyr Glu Asn Leu Asn Asn Leu Leu Thr Arg Tyr Asn Glu Leu Lys Gin 

35 40 45 

Thr Ala Ser Asn Thr Asn Ser Ser Thr Ala Gin Ala lie Asp Asn Leu 

50 55 60 

Lys Glu Ser Ala Ser Arg Leu Lys Thr Thr Pro Asn Ser Ala Asn Gin 
65 70 75 80 

Ala Val Ser Ser Ala Leu Ser Ser Ala Val Ala Met Trp Gin Val lie 

85 90 95 

Val Ser Asn Leu Ala Asn Asn Ser Leu Pro Thr Ser Glu Tyr Asn Lys 

100 105 110 

lie Asn Ala lie Ser Gin Ser Leu Gin Asn Thr Leu Glu Asn Lys Asn 

115 120 125 

Asn Asp Leu Lys lie Glu Asn Asp Tyr Asp His Leu Leu Thr Gin Ala 

130 135 140 

Ser Thr lie lie Asn Thr Leu Gin Ser Gin Cys Pro Gly lie Asp Gly 
145 150 155 160 

Gly Asn Gly Lys Pro Trp Gly lie Asn Ala Ser Gly Asn Ala Cys Asn 

165 170 175 

lie Phe Gly Asn Thr Phe Asn Ala lie Thr Ser Met lie Asp Ser Ala 

18 0 18 5 190 

Lys Lys Ala Ala Ala Asp Ala Arg Arg Thr Ala Pro Glu Ser Pro Asn 

195 200 205 

Gin Pro Ser Ala Phe Asn Asn Ala Asp Phe Asn Lys Asn Leu Asn Gin 

210 215 220 

Val Ser Ser Val lie Asn Asp Thr lie Ser Tyr Leu Lys Gly Asp Asn 
225 230 235 240 

Leu Ala Thr lie Tyr Asn Thr Leu Gin Lys Thr Pro Asp Ser Lys Gly 

245 250 255 

Phe Gin Ser Leu Val Ser Arg Ser Ser Tyr Ser Tyr Ser Leu Asn Glu 

260 265 270 

Thr Gin Tyr Ser Glu Phe Gin Thr Thr Thr Lys Glu Phe Gly His Asn 

275 280 285 

Pro Phe Arg Ser Val Gly Leu lie Asn Ser Gin Ser Asn Asn Gly Ala 

290 295 300 

Met Asn Gly Val Gly Val Gin Leu Gly Tyr Lys Gin Phe Phe Gly Lys 
305 310 315 320 

Asn Lys Phe Phe Gly lie Arg Tyr Tyr Ala Phe Phe Asp Tyr Asn His 

325 330 335 

Ala Tyr lie Lys Ser Asn Phe Phe Asn Ser Ala Ser Asn Val Phe Thr 

340 345 350 

Tyr Gly Ala Gly Ser Asp Leu Leu Leu Asn Phe lie Asn Gly Gly Ser 

355 360 365 

Asp Lys Asn Arg Lys Val Ser Phe Gly lie Phe Gly Gly lie Ala Leu 

370 375 380 

Ala Gly Thr Thr Trp Leu Asn Ser Gin Phe Met Asn Leu Lys Thr Thr 
385 390 395 400 

Asn Ser Ala Tyr Ser Ala Lys lie Asn Asn Thr Asn Phe Gin Phe Leu 

405 410 415 

Phe Asn Thr Gly Leu Arg Leu Gin Gly lie His His Gly Val Glu Leu 

420 425 430 

Gly Val Lys lie Pro Thr lie Asn Thr Asn Tyr Tyr Ser Phe Met Gly 

435 440 445 

Ala Lys Leu Ala Tyr Arg Arg Leu Tyr Ser Val Tyr Phe Asn Tyr Val 

450 455 460 

Leu Ala Tyr 
465 
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(2) INFORMATION FOR SEQ ID NO: 21: 

(i) SEQUENCE CHARACTERISTICS : 

(A) LENGTH: 2338 base pairs 

(B) TYPE: nucleic acid 

( C ) STRANDEDNESS : s ingle 

(D) TOPOLOGY : linear 

(ii) MOLECULE TYPE: Genomic DNA 
(ix) FEATURE: 

(A) NAME / KEY : Coding Sequence 

(B) LOCATION: 146... 2218 
(D) OTHER INFORMATION: 



(A) NAME / KEY : Signal Sequence 

(B) LOCATION: 146... 200 
(D) OTHER INFORMATION: 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO : 2 1 : 

ACTTAAAATT GTTTTTTTTT TTTTTCAAAA TATAAATTTT AAGCCAAAAA TAAGCATTTT 60 
ATGGTAAAAT GGCGAACTTT CATAAACATG ACTATTATGG GAATGTCATG GGAATGTGAA 12 0 

GAAAAATCTA TTAAAA GGA GAA AAC ATG AAA AAA TCC CTC TTA CTC TCT CTT 172 

Met Lys Lys Ser Leu Leu Leu Ser Leu 
-18 -15 -10 

TCT CTC ATC GCT TCC TTA TCA AGA GCT GAA GAT GAC GGA TTT TAT ACG 22 0 

Ser Leu lie Ala Ser Leu Ser Arg Ala Glu Asp Asp Gly Phe Tyr Thr 

-5 15 

AGT GTG GGC TAT CAG ATC GGT GAA GCG GTC CAA CAA GTG AAA AAC ACA 268 
Ser Val Gly Tyr Gin He Gly Glu Ala Val Gin Gin Val Lys Asn Thr 
10 15 20 

GGA GCA TTG CAA AAT CTT GCA GAC AGA TAC GAT AAC TTA AAC AAC CTT 316 
Gly Ala Leu Gin Asn Leu Ala Asp Arg Tyr Asp Asn Leu Asn Asn Leu 
25 30 35 

TTA AAC CAA TAC AAT TAT TTA AAT TCC TTA GTC AAT TTA GCC AGC ACG 3 64 

Leu Asn Gin Tyr Asn Tyr Leu Asn Ser Leu Val Asn Leu Ala Ser Thr 
40 45 50 55 

CCG AGC GCG ATC ACC GGT GCG ATT GAT AAT TTA AGC TCA AGC GCG ATT 412 
Pro Ser Ala He Thr Gly Ala He Asp Asn Leu Ser Ser Ser Ala lie 
60 65 70 

AAC CTC ACT AGC GCC ACC ACC ACT TCC CCC GCC TAT CAA GCT GTG GCT 4 60 

Asn Leu Thr Ser Ala Thr Thr Thr Ser Pro Ala Tyr Gin Ala Val Ala 
75 80 85 

TTA GCG CTC AAT GCC GCT GTG GGC ATG TGG CAA GTC ATA GCC CTT TTT 50 8 

Leu Ala Leu Asn Ala Ala Val Gly Met Trp Gin Val He Ala Leu Phe 
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90 95 100 

ATT GGC TGT GGC CCT GGC CCT ACC AAT AAT CAA AGC TAT CAA TCG TTT 55 6 

He Gly Cys Gly Pro Gly Pro Thr Asn Asn Gin Ser Tyr Gin Ser Phe 
105 110 115 

GGT AAC ACA CCA GCC CTT AAT GGG ACC ACC ACC ACT TGC AAT CAA GCA 604 
Gly Asn Thr Pro Ala Leu Asn Gly Thr Thr Thr Thr Cys Asn Gin Ala 
120 125 130 135 

TAT GGG ACA GGC CCT AAT GGC ATC CTA TCT ATT GAT GAA TAC CAA AAA 6 52 

Tyr Gly Thr Gly Pro Asn Gly He Leu Ser He Asp Glu Tyr Gin Lys 
140 145 150 

CTC AAC CAA GCT TAT CAG ATC ATC CAA ACC GCT TTA AAC CAA AAT CAA 700 
Leu Asn Gin Ala Tyr Gin lie He Gin Thr Ala Leu Asn Gin Asn Gin 
155 160 165 

GGG GGT GGG ATG CCT GCC TTG AAT GAC ACC ACC AAA ACA GGG GTA GTC 74 8 

Gly Gly Gly Met Pro Ala Leu Asn Asp Thr Thr Lys Thr Gly Val Val 
170 175 180 

AAC ATA CAA CAA ACC AAT TAT AGG ACC ACC ACA CAA AAC AAT ATC ATA 7 96 

Asn He Gin Gin Thr Asn Tyr Arg Thr Thr Thr Gin Asn Asn He He 
185 190 195 

GAG CAT TAT TAT ACA GAG AAT GGG AAA GAG ATC CCA GTC TCT TAT TCA 844 
Glu His Tyr Tyr Thr Glu Asn Gly Lys Glu He Pro Val Ser Tyr Ser 
200 205 210 215 

GGC GGA TCA TCA TTC TCG CCT ACA ATA CAA TTG ACA TAC CAT AAT AAC 8 92 

Gly Gly Ser Ser Phe Ser Pro Thr He Gin Leu Thr Tyr His Asn Asn 
220 225 230 

GCT GAA AAC CTT TTG CAA CAA GCC GCC ACT ATC ATG CAA GTC CTT ATT 94 0 

Ala Glu Asn Leu Leu Gin Gin Ala Ala Thr He Met Gin Val Leu He 
235 240 245 

ACT CAA AAG CCG CAT GTG CAA ACG AGC AAT GGC GGT AAA GCG TGG GGG 988 
Thr Gin Lys Pro His Val Gin Thr Ser Asn Gly Gly Lys Ala Trp Gly 
250 255 260 

TTG AGT TCT ACG CCT GGG AAT GTG ATG GAT ATT TTT GGT CCT TCT TTT 103 6 

Leu Ser Ser Thr Pro Gly Asn Val Met Asp He Phe Gly Pro Ser Phe 
265 270 275 

AAC GCT ATT AAT GAG ATG ATT AAA AAC GCT CAA ACA GCC CTA GCA AAA 10 84 

Asn Ala He Asn Glu Met He Lys Asn Ala Gin Thr Ala Leu Ala Lys 
280 285 290 295 

ACC CAA CAG CTT AAC GCT AAT GAA AAC GCC CAA ATC ACG CAA CCC AAC 1132 
Thr Gin Gin Leu Asn Ala Asn Glu Asn Ala Gin He Thr Gin Pro Asn 
300 305 310 

AAT TTC AAC CCC TAC ACC TCT AAA GAC AAA GGG TTC GCT CAA GAA ATG. 1180 
Asn Phe Asn Pro Tyr Thr Ser Lys Asp Lys Gly Phe Ala Gin Glu Met 
315 320 325 
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CTC AAT AGA GCT GAA GCT CAA GCA GAG ATT TTA AAT TTA GCT AAG CAA 12 2 8 

Leu Asn Arg Ala Glu Ala Gin Ala Glu lie Leu Asn Leu Ala Lys Gin 
330 335 340 

GTA GCG AAC AAT TTC CAC AGC ATT CAA GGG CCT ATT CAA GGG GAT TTA 1276 
Val Ala -Asn Asn Phe His -Ser- lie Gin Gly Pro lie -Gin Gly Asp. Leu. 
345 350 355 

GAA GAA TGT AAA GCA GGA TCG GCT GGC GTG ATC ACT AAT AAC ACT TGG 132 4 

Glu Glu Cys Lys Ala Gly Ser Ala Gly Val lie Thr Asn Asn Thr Trp 
360 365 370 375 

GGT TCA GGT TGC GCG TTT GTG AAA GAA ACT TTA AAC TCT TTA GAG CAA 13 7 2 

Gly Ser Gly Cys Ala Phe Val Lys Glu Thr Leu Asn Ser Leu Glu Gin 
380 385 390 

CAC ACC GCT TAT TAC GGC AAC CAG GTC AAT CAG GAT AGG GCT TTG GCT 142 0 

His Thr Ala Tyr Tyr Gly Asn Gin Val Asn Gin Asp Arg Ala Leu Ala 
395 400 405 

CAA ACC ATT TTG AAT TTT AAA GAA GCC CTT AAC ACC CTG AAT AAA GAC 14 6 8 

Gin Thr lie Leu Asn Phe Lys Glu Ala Leu Asn Thr Leu Asn Lys Asp 
410 415 420 

TCA AAA GCG ATC AAT AGC GGT ATC TCC AAC TTG CCT AAC GCT AAA TCT 1516 
Ser Lys Ala lie Asn Ser Gly lie Ser Asn Leu Pro Asn Ala Lys Ser 
425 430 435 

CTT CAA AAC ATG ACG CAT GCC ACT CAA AAC CCT AAT TCC CCA GAA GGT 1564 
Leu Gin Asn Met Thr His Ala Thr Gin Asn Pro Asn Ser Pro Glu Gly 
440 445 450 455 

CTG CTC ACT TAT TCT TTG GAT TCA AGC AAA TAC AAC CAG CTC CAA ACC 1612 
Leu Leu Thr Tyr Ser Leu Asp Ser Ser Lys Tyr Asn Gin Leu Gin Thr 
460 465 470 

ATC GCG CAA GAA TTG GGC AAA AAC CCT TTC AGG CGC TTT GGC GTG ATT 166 0 

lie Ala Gin Glu Leu Gly Lys Asn Pro Phe Arg Arg Phe Gly Val lie 
475 480 485 

GAC TTT CAA AAC AAC AAC GGC GCA ATG AAC GGG ATC GGC GTG CAA GTG 17 0 8 

Asp Phe Gin Asn Asn Asn Gly Ala Met Asn Gly lie Gly Val Gin Val 
490 495 500 

GGT TAT AAA CAA TTC TTT GGT AAA AAA AGG AAT TGG GGG TTA AGG TAT 17 5 6 

Gly Tyr Lys Gin Phe Phe Gly Lys Lys Arg Asn Trp Gly Leu Arg Tyr 
505 510 515 

TAT GGT TTC TTT GAT TAT AAC CAT GCT TAT ATC AAA TCT AAT TTT TTC 18 04 

Tyr Gly Phe Phe Asp Tyr Asn His Ala Tyr lie Lys Ser Asn Phe Phe 
520 525 530 535 

AAC TCC GCT TCT GAT GTG TGG ACT TAT GGG GTG GGT ATG GAC GCT CTC 185 2 

Asn Ser Ala Ser Asp Val Trp Thr Tyr Gly Val Gly Met Asp Ala Leu 
540 545 550 

TAT AAC TTC ATC AAC GAT AAA AAC ACC AAC TTT TTA GGC AAG AAC AAC 190 0 
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Tyr Asn Phe lie Asn Asp Lys Asn Thr Asn Phe Leu Gly Lys Asn Asn 
555 560 565 

AAG CTT TCA GTA GGG CTT TTT GGA GGC TTT GCG TTA GCC GGG ACT TCG 194 8 

Lys Leu Ser Val Gly Leu Phe Gly Gly Phe Ala Leu Ala Gly Thr Ser 
570 575 580 

TGG CTT AAT TCC CAA CAA GTG AAT TTG ACC ATG ATG AAT GGC ATT TAT 1996 
Trp Leu Asn Ser Gin Gin Val Asn Leu Thr Met Met Asn Gly lie Tyr 
585 590 595 

AAC GCT AAT GTC AGC ACT TCT AAC TTC CAA TTT TTG TTT GAT TTA GGC 2 04 4 

Asn Ala Asn Val Ser Thr Ser Asn Phe Gin Phe Leu Phe Asp Leu Gly 
600 605 610 615 

TTG AGA ATG AAC CTC GCT AGG CCT AAG AAA AAA GAC AGC GAT CAT GCC 2 0 92 

Leu Arg Met Asn Leu Ala Arg Pro Lys Lys Lys Asp Ser Asp His Ala 
620 625 630 

GCT GAG CAT GGC ATT GAA GTA GGT TTT AAG ATC CCC ACG ATC AAC ACC 214 0 

Ala Gin His Gly lie Glu Leu Gly Phe Lys lie Pro Thr lie Asn Thr 
635 640 645 

AAC TAT TAT TCT TTC ATG GGC GCT AAA CTA GAA TAC AGA AGG ATG TAT 218 8 

Asn Tyr Tyr Ser Phe Met Gly Ala Lys Leu Glu Tyr Arg Arg Met Tyr 
650 655 660 

AGC CTT TTT CTC AAT TAT GTG TTT GCT TAC TAAAAACTCT CTTTAAAAAA GGG 2 241 
Ser Leu Phe Leu Asn Tyr Val Phe Ala Tyr 
665 670 

GTTTGTTTAA AAACGCTTAA AAGCATTTTT AAAATT AAG C AGTAAAGAGC CTAGATAATC 23 01 
TCTTGCAACC GCTCTCAAGC GATAAAATTA AAGTGAT 23 3 8 

(2) INFORMATION FOR SEQ ID NO: 22: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 691 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: protein 
(v) FRAGMENT TYPE: internal 
(ix) FEATURE: 

(A) NAME/KEY: Signal Sequence 

(B) LOCATION: 1 ... 18 
(D) OTHER INFORMATION: 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 22: 

Met Lys Lys Ser Leu Leu Leu Ser Leu Ser Leu lie Ala Ser Leu Ser 

-18 -15 -10 -5 

Arg Ala Glu Asp Asp Gly Phe Tyr Thr Ser Val Gly Tyr Gin lie Gly 
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15 10 
Glu Ala Val Gin Gin Val Lys Asn Thr Gly Ala Leu Gin Asn Leu Ala 
15 20 25 30 

Asp Arg Tyr Asp Asn Leu Asn Asn Leu Leu Asn Gin Tyr Asn Tyr Leu 

35 40 45 

Asn. Ser Leu Val Asn Leu Ala Ser Thr Pro Ser Ala He Thr Gly Ala 

50 55 60 

He Asp Asn Leu Ser Ser Ser Ala He Asn Leu Thr Ser Ala Thr Thr 

65 70 75 

Thr Ser Pro Ala Tyr Gin Ala Val Ala Leu Ala Leu Asn Ala Ala Val 

80 85 90 

Gly Met Trp Gin Val He Ala Leu Phe He Gly Cys Gly Pro Gly Pro 
95 100 105 HO 

Thr Asn Asn Gin Ser Tyr Gin Ser Phe Gly Asn Thr Pro Ala Leu Asn 

115 120 125 

Gly Thr Thr Thr Thr Cys Asn Gin Ala Tyr Gly Thr Gly Pro Asn Gly 

130 135 140 

He Leu Ser He Asp Glu Tyr Gin Lys Leu Asn Gin Ala Tyr Gin He 

145 150 155 

He Gin Thr Ala Leu Asn Gin Asn Gin Gly Gly Gly Met Pro Ala Leu 

160 165 170 

Asn Asp Thr Thr Lys Thr Gly Val Val Asn He Gin Gin Thr Asn Tyr 
175 180 185 190 

Arg Thr Thr Thr Gin Asn Asn He He Glu His Tyr Tyr Thr Glu Asn 

195 200 205 

Gly Lys Glu He Pro Val Ser Tyr Ser Gly Gly Ser Ser Phe Ser Pro 

210 215 220 

Thr He Gin Leu Thr Tyr His Asn Asn Ala Glu Asn Leu Leu Gin Gin 

225 230 235 

Ala Ala Thr He Met Gin Val Leu He Thr Gin Lys Pro His Val Gin 

240 245 250 

Thr Ser Asn Gly Gly Lys Ala Trp Gly Leu Ser Ser Thr Pro Gly Asn 
255 260 265 270 

Val Met Asp He Phe Gly Pro Ser Phe Asn Ala He Asn Glu Met He 

275 280 285 

Lys Asn Ala Gin Thr Ala Leu Ala Lys Thr Gin Gin Leu Asn Ala Asn 

290 295 300 

Glu Asn Ala Gin He Thr Gin Pro Asn Asn Phe Asn Pro Tyr Thr Ser 

305 310 315 

Lys Asp Lys Gly Phe Ala Gin Glu Met Leu Asn Arg Ala Glu Ala Gin 

320 325 330 

Ala Glu He Leu Asn Leu Ala Lys Gin Val Ala Asn Asn Phe His Ser 
335 340 345 350 

He Gin Gly Pro He Gin Gly Asp Leu Glu Glu Cys Lys Ala Gly Ser 

355 360 365 

Ala Gly Val He Thr Asn Asn Thr Trp Gly Ser Gly Cys Ala Phe Val 

370 375 380 

Lys Glu Thr Leu Asn Ser Leu Glu Gin His Thr Ala Tyr Tyr Gly Asn 

385 390 395 

Gin Val Asn Gin Asp Arg Ala Leu Ala Gin Thr He Leu Asn Phe Lys 

400 405 410 

Glu Ala Leu Asn Thr Leu Asn Lys Asp Ser Lys Ala He Asn Ser Gly 
415 420 425 430 

He Ser Asn Leu Pro Asn Ala Lys Ser Leu Gin Asn Met Thr His Ala 

435 440 445 

Thr Gin Asn Pro Asn Ser Pro Glu Gly Leu Leu Thr Tyr Ser Leu Asp 
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450 

Ser Ser Lys Tyr 
465 

Asn Pro Phe Arg 
480 

Ala Met Asn Gly 
495 

Lys Lys Arg Asn 

His Ala Tyr lie 
530 

Thr Tyr Gly Val 
545 

Asn Thr Asn Phe 
560 

Gly Gly Phe Ala 
575 

Asn Leu Thr Met 

Asn Phe Gin Phe 
610 

Pro Lys Lys Lys 
625 

Gly Phe Lys lie 
640 

Ala Lys Leu Glu 
655 

Phe Ala Tyr 



Asn Gin Leu Gin 
470 

Arg Phe Gly Val 
485 

lie Gly Val Gin 
500 

Trp Gly Leu Arg 
515 

Lys Ser Asn Phe 

Gly Met Asp Ala 
550 

Leu Gly Lys Asn 
565 

Leu Ala Gly Thr 
580 

Met Asn Gly lie 
595 

Leu Phe Asp Leu 

Asp Ser Asp His 
630 

Pro Thr lie Asn 
645 

Tyr Arg Arg Met 
660 



455 

Thr lie Ala Gin 

He Asp Phe Gin 
490 

Val Gly Tyr Lys 
505 

Tyr Tyr Gly Phe 
520 

Phe Asn Ser Ala 
535 

Leu Tyr Asn Phe 

Asn Lys Leu Ser 
570 

Ser Trp Leu Asn 
585 

Tyr Asn Ala Asn 
600 

Gly Leu Arg Met 
615 

Ala Ala Gin His 

Thr Asn Tyr Tyr 
650 

Tyr Ser Leu Phe 
665 



460 

Glu Leu Gly Lys 
475 

Asn Asn Asn Gly 

Gin Phe Phe Gly 
510 

Phe Asp Tyr Asn 
525 

Ser Asp Val Trp 
540 

lie Asn Asp Lys 
555 

Val Gly Leu Phe 

Ser Gin Gin Val 
590 

Val Ser Thr Ser 
605 

Asn Leu Ala Arg 
620 

Gly lie Glu Leu 
635 

Ser Phe Met Gly 

Leu Asn Tyr Val 
670 



(2) INFORMATION FOR SEQ ID NO:23: 

(i) SEQUENCE CHARACTERISTICS : 

(A) LENGTH: 2 6 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 
(DJ TOPOLOGY: linear 

(ii) MOLECULE TYPE: Genomic DNA 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 23: 
TCAAGGAGAA AACATGAAAA AAACCC 2 6 

(2) INFORMATION FOR SEQ ID NO:24: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 2 6 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY ; linear 

(ii) MOLECULE TYPE: Genomic DNA 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO:24: 
GAAGACGACG GCTTTTACAC AAGCGT 2 6 
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(2) INFORMATION FOR SEQ ID NO: 25: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 24 base pairs 

(B) TYPE: nucleic acid 
■(C) STRANDEDNESS: single 
(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: Genomic DNA 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 25: 
AAAGCTTAGT AAGCGAACAC ATAA 24 
(2) INFORMATION FOR SEQ ID NO: 26: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 2 9 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

( D ) TOPOLOGY : 1 inear 

(ii) MOLECULE TYPE: Genomic DNA 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 26: 
AAGGAGAAAA AACATGAAAA AACACATCC 29 
(2) INFORMATION FOR SEQ ID NO: 27: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 25 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY : linear 

(ii) MOLECULE TYPE: Genomic DNA 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 27: 
GAAGACGACG GCTTTTACAC AAGCG 25 
(2) INFORMATION FOR SEQ ID NO: 28: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 2 6 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear _ 

(ii) MOLECULE TYPE: Genomic DNA 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO:28: 
AACATTAGTA AG CG AAC AC A TAGTTC - 26 

(2) INFORMATION FOR SEQ ID NO: 29: 
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(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 2 9 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: Genomic DNA 

(xi) SEQUENCE DESCRIPTION : SEQ ID NO : 2 9 : 



(2) INFORMATION FOR SEQ ID NO: 30: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 2 6 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: Genomic DNA 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO : 3 0 : 
GAAGACGACG GCTTTTACAC AAGCGT 2 6 



(2) INFORMATION FOR SEQ ID NO: 31: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 2 3 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: Genomic DNA 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO : 3 1 : 



(2) INFORMATION FOR SEQ ID NO : 32 : 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 2 6 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

( D ) TOPOLOGY : linear 

(ii) MOLECULE TYPE : Genomic DNA 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO:32: 
AAGGAGAAAA CATGAAGAAA AAATTT 2 6 

(2) INFORMATION FOR SEQ ID NO: 33: 
(i) SEQUENCE CHARACTERISTICS: 



AAGGAGAAAA AACATGAAAA AACACATCC 



29 



AAAAGCTTAG TAAGCGAACA CAT 



23 
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(A) LENGTH: 2 5 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY : linear 

(ii) MOLECULE TYPE: Genomic DNA 
(xi) SEQUENCE DESCRIPTION: SEQ ID NO:33: 
GAAGACAACG GCTTTTTTGT GAGTG 2 5 

(2) INFORMATION FOR SEQ ID NO: 34: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 2 5 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: Genomic DNA 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 34: 
AGCTTTTAGT AAGCAAACAC ATAGT 2 5 

(2) INFORMATION FOR SEQ ID NO : 3 5 : 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 2 5 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: Genomic DNA 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 35: 
AAGGATATTT ATGAAAAAAA CCCTT 2 - 
(2) INFORMATION FOR SEQ ID NO : 3 6 : 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 2 5 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: Genomic DNA 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 36: 

GAAGACAACG GCTTTTTTAT CAGCG 21 

(2) INFORMATION FOR SEQ ID NO: 37: 

( i ) SEQUENCE CHARACTERISTICS : 
(A) LENGTH: 26 base pairs 
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(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: Genomic DNA 
(xi) SEQUENCE DESCRIPTION: SEQ ID NO:37: 
GATATTAGTA AGCAAACACA TAATTC 2 6 

(2) INFORMATION FOR SEQ ID NO : 3 8 : 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 27 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: Genomic DNA 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO : 3 8 : 
AAGGAGAAAA CATGAAAAAA TCCCTCT 2 7 

(2) INFORMATION FOR SEQ ID NO: 39: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 26 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: Genomic DNA 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO : 3 9 : 
GAAGATGACG GAT TTT AT AC GAGTGT 2 6 

(2) INFORMATION FOR SEQ ID NO: 40: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 2 6 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: Genomic DNA 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 40: 
TTTTAGTAAG CAAACACATA ATTGAG 2 6 

(2) INFORMATION FOR SEQ ID NO : 4 1 : 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 24 base pairs 

(B) TYPE: nucleic acid 
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(C) STRANDEDNESS : single 

(D) TOPOLOGY : linear 

(ii) MOLECULE TYPE: Genomic DNA 
- (xi ) SEQUENCE DESCRIPTION : SEQ ID NO : 4 1 : 
AAGGAACATC TTATGAAAAA AACG 

(2) INFORMATION FOR SEQ ID NO: 42: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 25 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D ) TOPOLOGY : linear 

(ii) MOLECULE TYPE: Genomic DNA 

(xi) SEQUENCE DESCRIPTION : SEQ ID NO: 42: 
GAAGACAACG GCGTTTTTTT AAGCG 

(2) INFORMATION FOR SEQ ID NO: 43: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 2 5 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY : linear 

(ii) MOLECULE TYPE: Genomic DNA 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 43: 
GGTTTTTAAT AGGCAAACAC ATAAT 

(2) INFORMATION FOR SEQ ID NO:44: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 2 6 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: Genomic DNA 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO:44: 
AAGGAACATT TTATGAAAAA GACAAT 

(2) INFORMATION FOR SEQ ID NO: 45: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 2 5 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 
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(D) TOPOLOGY: linear 



(ii) MOLECULE TYPE: Genomic DNA 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 45: 



GAAGACAACG GCTTTTTTGT GAGCG 



25 



(2) INFORMATION FOR SEQ ID NO: 46: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 2 3 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: Genomic DNA 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 46: 



(2) INFORMATION FOR SEQ ID NO: 47: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 25 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE : Genomic DNA 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO:47: 
AAGGAACATT TTATGAAAAA GACAA 2 5 

(2) INFORMATION FOR SEQ ID NO: 48: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 25 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: Genomic DNA 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 48: 
GAAGACAACG GCTTTTTTGT GAGCG 25 



(2) INFORMATION FOR SEQ ID NO: 49: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 26 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 



TCACTCAGTA AGCGAACACA TAA 



23 
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(ii) MOLECULE TYPE: Genomic DNA 
(xi) SEQUENCE DESCRIPTION: SEQ ID NO : 4 9 : 
TTTTAATAAG CGAACACATA AAAGAG 

(2) INFORMATION FOR SEQ ID NO: 50: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 2 6 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

( D ) TO PO LOG Y : 1 i ne a r 

(ii) MOLECULE TYPE : Genomic DNA 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 50: 
AAGGAACATT TTATGAAAAA AACGAT 

(2) INFORMATION FOR SEQ ID NO: 51: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 25 base pairs 

(B) TYPE : nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY : linear 

(ii) MOLECULE TYPE: Genomic DNA 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO : 5 1 : 
GAAAATGACG GCGTTTTTAT GAGCG 

(2) INFORMATION FOR SEQ ID NO: 52: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 26 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: Genomic DNA 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO:52: 
ATATCAATAG GCCAAAACAT AATTGA 

(2) INFORMATION FOR SEQ ID NO: 53: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 2 6 base pairs 

(B) TYPE : nucleic acid 

(C) STRANDEDNESS : single 

( D ) TOPOLOGY : 1 inear 

(ii) MOLECULE TYPE: Genomic DNA 



SUBSTITUTE SHEET (RULE 26) 



BNSDOCID: <WO 9843479A1_I_> 



WO 98/43479 



PCT/US98/06421 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 53: 
AAGGAGAAAA CATGAAAAAA TCCCTC 2 6 

(2) INFORMATION FOR SEQ ID NO:54: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 2 6 base pairs 

(B) TYPE : nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY : linear 

(ii) MOLECULE TYPE: Genomic DNA 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 54: 
GAAGATGACG GATTTTATAC GAGTGT 2 6 

(2) INFORMATION FOR SEQ ID NO: 55: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 26 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY : linear 

(ii) MOLECULE TYPE: Genomic DNA 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO:55: 
TTTTAGTAAG CAAACACATA ATTGAG 2 6 

(2) INFORMATION FOR SEQ ID NO: 56: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 32 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 56: 
CGCGGATCCG AATCCAATTT AATCCAAAAA GG 3 2 

(2) INFORMATION FOR SEQ ID NO: 57: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 3 3 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: cDNA 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 57: 
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CCGCTCGAGT TAAGTAAGCG AACACATATT CAA 

(2) INFORMATION FOR SEQ ID NO: 58 

(i) SEQUENCE CHARACTERISTICS: 
" (A) LENGTH: 2 0 amino acids - 

(B) TYPE: amino acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: peptide 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 58 

Glu Asp Asp Gly Phe Tyr Thr Ser Val Gly Tyr Gin He Gly Glu Ala 

X 5 10 15 

Ala Gin Met Val 
20 



(2) INFORMATION FOR SEQ ID NO: 59: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 31 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY : linear 

(ii) MOLECULE TYPE: cDNA 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 59: 
CTGAATTCGA TTTCAAGGAG AAAACATGAA A 

(2) INFORMATION FOR SEQ ID NO: 60: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 3 0 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: cDNA 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 60: 
CCGCTCGAGT TAGTAAGCGA ACACATAATT 

(2) INFORMATION FOR SEQ ID NO: 61: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 32 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 
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(ii) MOLECULE TYPE : CDNA 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 61: 
CGCGGATCCG AATCCAATTT AATCCAAAAA GG 3 2 

(2) INFORMATION FOR SEQ ID NO: 62: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 33 base pairs 

(B ) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: CDNA 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 62: 
CCGCTCGAGT TAGTAAGCGA ACACATAGTT CAA 3 3 

(2) INFORMATION FOR SEQ ID NO: 63: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 29 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: CDNA 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 63: 
CGCGGATCCG AAGTTTCTTT GTATCAAAG 2 9 

(2) INFORMATION FOR SEQ ID NO : 64 : 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 33 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: cDNA 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 64: 
CCGCTCGAGT TAGTAAG CAA ACACATAATT GTG 3 3 



(2) INFORMATION FOR SEQ ID NO: 65: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 114 9 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 
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(ii) MOLECULE TYPE: Genomic DNA 
<ix) FEATURE: 

(A) NAME /KEY : Coding Sequence 

(B) LOCATION: 106 . . . 1002 
(D). OTHER INFORMATION,: _ _ 



(A) NAME / KEY : Signal Sequence 

(B) LOCATION: 106 . . . 166 
(D) OTHER INFORMATION: 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 65: 

TT ACT CTTT A ATGTGAGTTT TCTGTGTCAT GATAGCTGAT TTTGTTTTAA ATTTG CT AT A 6 0 

ATGTGAATTT AATGATGAAA ATTAGTTTAG AGTGGAGAAC ACACA ATG AAA AAA AAT 117 

Met Lys Lys Asn 
-20 

ATC TTA AAT TTA GCG TTA GTG GGT GCG TTG AGC ACG TCG TTT TTG ATG 16 5 

lie Leu Asn Leu Ala Leu Val Gly Ala Leu Ser Thr Ser Phe Leu Met 

-15 -10 "5 

GCT AAG CCG GCT CAT AAC GCA AAT AAC GCT ACG CAT AAC ACG AAA AAA 213 
Ala Lys Pro Ala His Asn Ala Asn Asn Ala Thr His Asn Thr Lys Lys 

15 10 15 

ACG ACT GAT TCT TCA GCA GGC GTG TTA GCG ACA GTG GAT GGC AGA CCT 2 61 

Thr Thr Asp Ser Ser Ala Gly Val Leu Ala Thr Val Asp Gly Arg Pro 

20 25 30 

ATC ACT AAA AGC GAT TTT GAC ATG ATT AAG CAA CGA AAT CCT AAT TTT 3 09 

lie Thr Lys Ser Asp Phe Asp Met He Lys Gin Arg Asn Pro Asn Phe 

35 40 45 

GAT TTT GAC AAG CTT AAA GAG AAA GAA AAA GAA GCC TTG ATT GAT CAA 3 57 

Asp Phe Asp Lys Leu Lys Glu Lys Glu Lys Glu Ala Leu He Asp Gin 

50 55 60 

GCT ATT CGC ACC GCC CTT GTA GAA AAT GAA GCT AAA ACC GAG AAA TTG 405 
Ala He Arg Thr Ala Leu Val Glu Asn Glu Ala Lys Thr Glu Lys Leu 
65 70 75 80 

GAC AGC ACT CCA GAA TTT AAA GCG ATG ATG GAA GCG GTT AAA AAA CAG 453 
Asp Ser Thr Pro Glu Phe Lys Ala Met Met Glu Ala Val Lys Lys Gin 

85 90 95 

GCT TTA GTG GAA TTT TGG GCT AAA AAA CAG GCT GAA GAA GTG AAA AAA 501 
Ala Leu Val Glu Phe Trp Ala Lys Lys Gin Ala Glu Glu Val Lys Lys 

100 105 HO 

GTC CAA ATC CCA GAA AAA GAA ATG CAA GAT TTT TAC AAC GCT AAC AAA 54 9 

Val Gin He Pro Glu Lys Glu Met Gin Asp Phe Tyr Asn Ala Asn Lys 

115 120 125 

GAT CAG CTT TTT GTC AAG CAA GAA GCC CAT GCT AGG CAT ATT TTA GTG 5 97 

Asp Gin Leu Phe Val Lys Gin Glu Ala His Ala Arg His He Leu Val " 

130 135 140 

AAA ACC GAA GAT GAG GCT AAA CGG ATT ATT TCT GAG ATT GAC AAA CAG 64 5 

Lys Thr Glu Asp Glu Ala Lys Arg He He Ser Glu He Asp Lys Gin 
145 150 155 160 

CCA AAG GCT AAA AAA GAA GCT AAA TTC ATT GAG TTA GCC AAT CGG GAT 693 
Pro Lys Ala Lys Lys Glu Ala Lys Phe He Glu Leu Ala Asn Arg Asp 
165 170 175 
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ACG 


ATT 


GAT 


CCT 


AAC 


AGC 


AAG 


AAC 


GCG 


CAA 


AAT 


GGC 


GGT 


GAT 


TTG 


GGG 


741 


Thr 


He 


Asp 


Pro 
180 


Asn 


Ser 


Lys 


Asn 


Ala 
185 


Gin 


Asn 


Gly 


Gly 


Asp 
190 


Leu 


Gly 




AAA 


TTC 


CAA 


AAG 


AAC 


CAA 


ATG 


GCT 


CCG 


GAT 


TTT 


TCT 


AAA 


GCC 


GCT 


TTC 


789 


Lys 


Phe 


Gin 
195 


Lys 


Asn 


Gin 


Met 


Ala 
200 


Pro 


Asp 


Phe 


Ser 


Lys 
205 


Ala 


Ala 


Phe 




GCT 


TTA 


ACT 


CCT 


GGG 


GAT 


TAC 


ACT 


AAA 


ACC 


CCT 


GTT 


AAA 


ACA 


GAG 


TTT 


837 


Ala 


Leu 
210 


Thr 


Pro 


Gly 


Asp 


Tyr 
215 


Thr 


Lys 


Thr 


Pro 


Val 
220 


Lys 


Thr 


Glu 


Phe 




GGT 


TAT 


CAT 


ATT 


ATC 


TAT 


TTG 


ATT 


TCT 


AAA 


GAT 


AGC 


CCT 


GTA 


ACT 


TAT 


885 


Gly 


Tyr 


His 


He 


He 


Tyr 


Leu 


He 


Ser 


Lys 


Asp 


Ser 


Pro 


Val 


Thr 


Tyr 




225 










230 










235 










240 




ACT 


TAT 


GAA 


CAG 


GCT 


AAA 


CCT 


ACC 


ATT 


AAG 


GGG 


ATG 


TTA 


CAA 


GAA 


AAG 


933 


Thr 


Tyr 


Glu 


Gin 


Ala 
245 


Lys 


Pro 


Thr 


He 


Lys 
250 


Gly 


Met 


Leu 


Gin 


Glu 
255 


Lys 




CTT 


TTC 


CAA 


GAA 


CGC 


ATG 


AAT 


CAA 


CGC 


ATT 


GAG 


GAA 


CTA 


AGA 


AAG 


CAC 


981 


Leu 


Phe 


Gin 


Glu 
260 


Arg 


Met 


Asn 


Gin 


Arg 
265 


He 


Glu 


Glu 


Leu 


Arg 
270 


Lys 


His 




GCT 


AAA 


ATT 


GTT 


ATC 


AAC 


AAG 


TAATTGATGA GGTGTTATCA TGTTAGTTAA AGGC 


1036 


Ala 


T.\/<= 
— j — 


lie 
275 


Va-1 


Ik 


Asn 


Lys 























AATGAAATTT TATTGAAAGC CCATAAAGAA GGTTATGGGG TGGGGGCGTT TAATTTCGTG 10 96 
AATTTTGAAA TGCTAAACGC TATTTTTGAA GCAGGAAATG AGGAAAATTC CCC 114 9 



(2) INFORMATION FOR SEQ ID NO: 66: 



(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 2 99 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: protein 
(v) FRAGMENT TYPE: internal 
(ix) FEATURE: 



(A) NAME /KEY : Signal Sequence 

(B) LOCATION: 1. . .20 
(D) OTHER INFORMATION: 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 66: 

Met Lys Lys Asn lie Leu Asn Leu Ala Leu Val Gly Ala Leu Ser Thr 
-20 -15 -10 -5 

Ser Phe Leu Met Ala Lys Pro Ala His Asn Ala Asn Asn Ala Thr His 

15 10 
Asn Thr Lys Lys Thr Thr Asp Ser Ser Ala Gly Val Leu Ala Thr Val 

15 20 25 

Asp Gly Arg Pro He Thr Lys Ser Asp Phe Asp Met He Lys Gin Arg 

30 35 40 

Asn Pro Asn Phe Asp Phe Asp Lys Leu Lys Glu Lys Glu Lys Glu Ala 
45 50 55 60 

Leu He Asp Gin Ala He Arg Thr Ala Leu Val Glu Asn Glu Ala Lys _ 

65 70 75 

Thr Glu Lys Leu Asp Ser Thr Pro Glu Phe Lys Ala Met Met Glu Ala 
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Val 


Lys 


Lys 


Gin 


Ala 


Leu 


Val 


Glu 


Phe 


Trp 


Ala 


Lys 


Lys 


Gin 


Ala 


bill 




95 










100 










IUj 








Glu 


Val 


Lys 


Lys 


Val 


Gin 


He 


Pro 


Glu 


Lys 


Glu 


Met 


Cj m 


Asp 


fuc 


Tyr 




110 










115 










ion 










Asn 


Ala 


Asn 


Lys 


Asp- 


Gin 


Leu 


Phe 


Val 


Lys 


Gin 


Glu 


Ala 


ill S 


.M.J- ct . 


. Arg. 


125 










13 0 




















140 


His 


He 


Leu 


Val 


Lys 


Thr 


Glu 


Asp 


Glu 


Ala 


Lys 


Arg 


Tin 

lie 


lie 


Q <=» -r- 


m n 










145 










150 










1 j j 




He 


Asp 


Lys 


Gin 


Pro 


Lys 


Ala 


Lys 


Lys 


Glu 


Ala 


Lys 


Phe 


lie 


uiU 


Leu 








160 










165 










1 / u 






Ala 


Asn 


Arg 


Asp 


Thr 


He 


Asp 


Pro 


Asn 


Ser 


Lys 


Asn 


Ala 


Gin 


Asn 


biy 






175 










180 










loo 








Gly 


Asp 


Leu 


Gly 


Lys 


Phe 


Gin 


Lys 


Asn 


Gin 


Met 


Ala 


Pro 


Asp 


rile 


Ser 




190 










195 










2 0 0 










Lys 


Ala 


Ala 


Phe 


Ala 


Leu 


Thr 


Pro 


Gly 


Asp 


Tyr 


1 nr 


Lys 


Th T 
1 ili. 


Pro 


Val 


205 










A 1 U 










*C J. J 










220 


Lys 


Thr 


Glu 


Phe 


Gly 


Tyr 


His 


He 


He 


Tyr 


Leu 


He 


Ser 


Lys 


Asp 


Ser 








225 










230 










235 




Pro 


Val 


Thr 


Tyr 


Thr 


Tyr 


Glu 


Gin 


Ala 


Lys 


Pro 


Thr 


He 


Lys 


Gly 


Met 








240 










245 










250 






Leu 


Gin 


Glu 


Lys 


Leu 


Phe 


Gin 


Glu 


Arg 


Met 


Asn 


Gin 


Arg 


He 


Glu 


Glu 






255 










260 










265 








Leu 


Arg 


Lys 


His 


Ala 


Lys 


He 


Val 


He 


Asn 


Lys 













270 275 



(2) INFORMATION FOR SEQ ID NO:67: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 144 8 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

( D ) TOPOLOGY : linear 

(ii) MOLECULE TYPE: Genomic DNA 
(ix) FEATURE : 

(A) NAME/ KEY : Coding Sequence 

(B) LOCATION: 118... 1314 
(D) OTHER INFORMATION: 



(xi) SEQUENCE DESCRIPTION : SEQ ID NO: 67: 

CTCTTGAATG GCGATAAGAC AAAAATGTCT TAAATTTTGT GGTAGCATTT AGGAATACTT 60 
AGGATTTTGT TTAGTATAAT TCTAAAATCC ATTTCAAAAA ATTAAGGAGA AATACAA ATG 12 0 

Met 

. . .1 

GCA AAA GAA AAG TTT AAC AGA ACT AAG CCG CAT GTT AAT ATT GGA ACC 16 8 

Ala Lys Glu Lys Phe Asn Arg Thr Lys Pro His Val Asn He Gly Thr 

5 10 15 

ATT GGG CAT GTA GAC CAT GGT AAA ACG ACT TTG AGT GCA GCG ATT TCA 216 
He Gly His Val Asp His Gly Lys Thr Thr Leu Ser Ala Ala He Ser 

20 25 30 

GCG GTG CTT TCT TTG AAA GGT CTT GCA GAA ATG AAA GAC TAT GAT AAT 2 64 

Ala Val Leu Ser Leu Lys Gly Leu Ala Glu Met Lys Asp Tyr Asp Asn 
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35 40 45 

ATT GAT AAC GCC CCT GAA GAA AAA GAA AGA GGG ATC ACT ATC GCT ACT 312 

lie Asp Asn Ala Pro Glu Glu Lys Glu Arg Gly lie Thr lie Ala Thr 
50 55 60 65 

TCT CAC ATT GAA TAT GAG ACT GAA AAC AGA CAC TAT GCG CAT GTG GAT 360 

Ser His He Glu Tyr Glu Thr Glu Asn Arg His Tyr Ala His 'Val Asp 

70 75 80 

TGC CCA GGA CAC GCT GAC TAT GTA AAA AAC ATG ATC ACC GGT GCG GCG 408 

Cys Pro Gly His Ala Asp Tyr Val Lys Asn Met He Thr Gly Ala Ala 

85 90 95 

CAA ATG GAC GGA GCG ATT TTG GTT GTT TCT GCA GCT GAT GGC CCT ATG 45 6 

Gin Met Asp Gly Ala He Leu Val Val Ser Ala Ala Asp Gly Pro Met 

100 105 110 

CCT CAA ACT AGG GAG CAT ATC TTA TTG TCT CGT CAA GTA GGC GTG CCT 504 

Pro Gin Thr Arg Glu His He Leu Leu Ser Arg Gin Val Gly Val Pro 

115 120 125 

CAC ATC GTT GTT TTC TTA AAC AAA CAA GAC ATG GTA GAT GAC CAA GAA 5 52 

His He Val Val Phe Leu Asn Lys Gin Asp Met Val Asp Asp Gin Glu 
130 135 140 145 

TTG TT-A GAA CTT GTA GAA ATG GAA GTG CGC GAA TTG TTG AGC GCG TAT 6 00 

Leu Leu Glu Leu Val Glu Met Glu Val Arg Glu Leu Leu Ser Ala Tyr 

150 155 160 

GAA TTT CCT GGC GAT GAC ACT CCT ATC GTA GCG GGT TCA GCT TTA AGA 64 8 

Glu Phe Pro Gly Asp Asp Thr Pro He Val Ala Gly Ser Ala Leu Arg 

165 170 175 

GCT TTA GAA GAA GCA AAG GCT GGT AAT GTG GGT GAA TGG GGT GAA AAA 696 

Ala Leu Glu Glu Ala Lys Ala Gly Asn Val Gly Glu Trp Gly Glu Lys 

180 185 190 

GTG CTT AAA CTT ATG GCT GAA GTG GAT GCC TAT ATC CCT ACT CCA GAA 74 4 

Val Leu Lys Leu Met Ala Glu Val Asp Ala Tyr He Pro Thr Pro Glu 

195 200 205 

AGA GAC ACT GAA AAA ACT TTC TTG ATG CCG GTT GAA GAT GTG TTC TCT 7 92 

Arg Asp Thr Glu Lys Thr Phe Leu Met Pro Val Glu Asp Val Phe Ser 
210 215 220 225 

ATT GCG GGT AGA GGG ACT GTG GTT AC A GGT AGG ATT GAA AGA GGC GTG 84 0 

He Ala Gly Arg Gly Thr Val Val Thr Gly Arg He Glu Arg Gly Val 

230 235 240 

GTG AAA GTA GGC GAT GAA GTG GAA ATC GTT GGT ATC AGA CCT ACA CAA 888 

Val Lys Val Gly Asp Glu Val Glu He Val Gly He Arg Pro Thr Gin 

245 250 255 

AAA ACG ACT GTA ACC GGT GTA GAA ATG TTT AGG AAA GAG TTG GAA AAA 93 6 

Lys Thr Thr Val Thr Gly Val Glu Met Phe Arg Lys Glu Leu Glu Lys 

260 265 270 

GGT GAA GCC GGC GAT AAT GTG GGC GTG CTT TTG AGA GGA ACT AAA AAA 9 84 

Gly Glu Ala Gly Asp Asn Val Gly Val Leu Leu Arg Gly Thr Lys Lys 

275 280 285 

GAA GAA GTG GAA CGC GGT ATG GTT CTA TGC AAA CCA GGT TCT ATC ACT 10 32 

Glu Glu Val Glu Arg Gly Met Val Leu Cys Lys Pro Gly Ser He Thr 
290 295 300 305 

CCG CAC AAG AAA TTT GAG GGA GAA ATT TAT GTC CTT TCT AAA GAA GAA " 108 0 

Pro His Lys Lys Phe Glu Gly Glu He Tyr Val Leu Ser Lys Glu Glu 

310 315 320 

GGC GGG AGA CAC ACT CCA TTC TTC ACC AAT TAC CGC CCG CAA TTC TAT 112 8 

Gly Gly Arg His Thr Pro Phe Phe Thr Asn Tyr Arg Pro Gin Phe Tyr 

325 330 335 

GTG CGC ACA ACT GAT GTG ACT GGC TCT ATC ACC CTT CCT GAA GGC GTA 117 6 

Val Arg Thr Thr Asp Val Thr Gly Ser lie Thr Leu Pro Glu Gly Val 
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GAA 


ATG 


GTT 


ATG 


CCT 


GGC 


GAT 


AAT 


GTG 


AAA 


ATC 


ACT 


GTA 


GAG 


TTG 


ATT 


1224 


Glu 


Met 
355 


V 3. JL 


rie u 


Pro 


\j±y 


Asp 
360 


Asn 


Val 


Lys 


lie 


Thr 
365 


Val 


Glu 


Leu 


He 




AGC 


CCT 


GTT 


GCG 


TTA 


GAG 


TTG 


GGA 


ACT 


AAA 


TTT 


GCG 


ATT 


CGT 


GAA 


GGC 


1272 


Ser_ 


Pro 


Val. 


Ala 


Leu 


Glu 


Leu 


.ply 


Thr 


Lys 


Phe 


Ala 


He 


Arg 


Glu 


Gly 




370 










375 










380 










385 




GGT 


AGG 


ACC 


GTT 


GGT 


GCT 


GGT 


GTT 


GTG 


AGC 


AAT 


ATT 


ATT 


GAA 


TAATATTAG 


1323 


Gly 


Arg 


Thr 


Val 


Gly 
390 


Ala 


Gly 


Val 


Val 


Ser 
395 


Asn 


He 


He 


Glu 









1448 



CAAAAAGAGA GTTACCATAA AGGGTCATTA TGAAAGTTAA AATAGGGTTG AAGTGTTCTG 13 8 3 
ATTGTGAAGA TATCAATTAC AG CAC AACC A AGAACGCTAA AACTAACACT GAAAAACTGG 144 3 
AGCTT 

(2) INFORMATION FOR SEQ ID NO: 68: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 3 99 amino acids 

(B) TYPE: amino acid 

(C) STRAND EDNES S : single 

(D) TOPOLOGY : linear 

(ii) MOLECULE TYPE: protein 
(v) FRAGMENT TYPE: internal 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO:68: 

Met Ala Lys Glu Lys Phe Asn Arg Thr Lys Pro His Val Asn lie Gly 

15 10 15 

Thr He Gly His Val Asp His Gly Lys Thr Thr Leu Ser Ala Ala He 

20 25 30 

Ser Ala Val Leu Ser Leu Lys Gly Leu Ala Glu Met Lys Asp Tyr Asp 

35 40 45 

Asn He Asp Asn Ala Pro Glu Glu Lys Glu Arg Gly He Thr He Ala 

50 55 60 

Thr Ser His He Glu Tyr Glu Thr Glu Asn Arg His Tyr Ala His Val 
65 70 75 80 

Asp Cys Pro Gly His Ala Asp Tyr Val Lys Asn Met He Thr Gly Ala 

85 90 95 

Ala Gin Met Asp Gly Ala He Leu Val Val Ser Ala Ala Asp Gly Pro 

100 105 HO 

Met Pro Gin Thr Arg Glu His He Leu Leu Ser Arg Gin Val Gly Val 

115 120 125 

Pro His He Val Val Phe Leu Asn Lys Gin Asp Met Val Asp Asp Gin 

130 135 140 

Glu Leu Leu Glu Leu Val Glu Met Glu Val Arg Glu Leu Leu Ser Ala 
145 150 155 160 

Tyr Glu Phe Pro Gly Asp Asp Thr Pro He Val Ala Gly Ser Ala Leu 

165 170 175 

Arg Ala Leu Glu Glu Ala Lys Ala Gly Asn Val Gly Glu Trp Gly Glu 

180 185 190 

Lys Val Leu Lys Leu Met Ala Glu Val Asp Ala Tyr He Pro Thr Pro 

195 200 205 

Glu Arg Asp Thr Glu Lys Thr Phe Leu Met Pro Val Glu Asp Val Phe 

210 215 220 

Ser He Ala Gly Arg Gly Thr Val Val Thr Gly Arg He Glu Arg Gly 
225 230 235 240 
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Val Val Lys Val Gly Asp Glu Val Glu lie Val Gly lie Arg Pro Thr 

245 250 255 

Gin Lys Thr Thr Val Thr Gly Val Glu Met Phe Arg Lys Glu Leu Glu 

260 265 270 

Lys Gly Glu Ala Gly Asp Asn Val Gly Val Leu Leu Arg Gly Thr Lys 

275 280 285 

Lys Glu Glu Val Glu Arg Gly Met Val Leu Cys Lys Pro Gly Ser lie 

290 295 300 

Thr Pro His Lys Lys Phe Glu Gly Glu lie Tyr Val Leu Ser Lys Glu 
305 310 315 320 

Glu Gly Gly Arg His Thr Pro Phe Phe Thr Asn Tyr Arg Pro Gin Phe 

325 330 335 

Tyr Val Arg Thr Thr Asp Val Thr Gly Ser He Thr Leu Pro Glu Gly 

340 345 350 

Val Glu Met Val Met Pro Gly Asp Asn Val Lys He Thr Val Glu Leu 

355 360 365 

He Ser Pro Val Ala Leu Glu Leu Gly Thr Lys Phe Ala He Arg Glu 

370 375 380 

Gly Gly Arg Thr Val Gly Ala Gly Val Val Ser Asn lie He Glu 
3 85 3 30 3 95 

(2) INFORMATION FOR SEQ ID NO: 69: 

(i) SEQUENCE CHARACTERISTICS : 

(A) LENGTH: 32 base pairs 

(B) TYPE: nucleic acid 
<C) STRANDEDNESS : single 
(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 69: 
CGCGGATCCG AATGAAAAAA AATATCTTAA AT 3 2 

(2) INFORMATION FOR SEQ ID NO: 70: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 3 0 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 70: 
CCGCTCGAGT TACTTGTTGA TAACAATTTT 3 0 

(2) INFORMATION FOR SEQ ID NO: 71: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 3 2 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 71: 
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CGCGGATCCG AATGGCAAAA GAAAAGTTTA AC 

{2) INFORMATION FOR SEQ ID NO: 72: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 3 0 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear. 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO:72: 
CCGCTCGAGT T ATT CAATAA TATTGCTCAC 



(2) INFORMATION FOR SEQ ID NO: 73: 

( i ) SEQUENCE CHARACTERISTICS : 

(A) LENGTH: 22 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: Peptide 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 73: 

* Met Lys Glu Lys Phe Asn Arg Thr Lys Pro His Val Asn lie Gly 
15 10 15 

lie Gly His Val Asp His 
20 

(2) INFORMATION FOR SEQ ID NO: 74: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 13 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: Peptide 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 74: 

Ala His Asn Ala Asn Asn Ala Thr His Asn Thr Lys Lys 
15 10 

(2) INFORMATION FOR SEQ ID NO: 75: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 6 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: Peptide 



SUBSTITUTE SHEET (RULE 26) 



BNSDOCID: <WO 9843479A1J_> 



WO 98/43479 



155 



PCT/US98/06421 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 75: 



Lys 
1 



Pro Ala His Asn Ala 



5 



(2) INFORMATION FOR SEQ ID NO:76: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 9 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS : single 
<D) TOPOLOGY : linear 

(ii) MOLECULE TYPE : Peptide 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO:76: 

lie Asp Lys Gin Pro Lys Ala Lys Lys 
1 5 



(2) INFORMATION FOR SEQ ID NO: 77: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 8 amino acids 

(B) TYPE: amino acid 

(C) . STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: Peptide 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 77: 



(2) INFORMATION FOR SEQ ID NO: 78: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 2 9 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 78: 
GTGGAGAACA CACAATGAAA AAAAATATC 2 9 

(2) INFORMATION FOR SEQ ID NO: 79: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 31 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 79: 



Phe 



Trp Ala Lys Lys Gin Ala Glu 

5 



1 
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GCTAATATTA TTCAATAATA TTGCTCACAA C 

(2) INFORMATION FOR SEQ ID NO: 80: 

(i) SEQUENCE CHARACTERISTICS : 

(A) LENGTH: 27 base pairs. 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY : linear 



<xi) SEQUENCE DESCRIPTION: SEQ ID NO: 80: 
GGAGAAATAC AAATGGCAAA AGAAAAG 

(2) INFORMATION FOR SEQ ID NO: 81: 

{ i ) SEQUENCE CHARACTERISTICS : 

(A) LENGTH: 31 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 81: 
GCTAATATTA TTCAATAATA TTGCTCACAA C 

(2) INFORMATION FOR SEQ ID NO: 82: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 24 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 82: 
CATAACGCAA ATAACGCTAC GCAT 

(2) INFORMATION FOR SEQ ID NO: 83: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 26 base pairs 

(B) TYPE : nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO : 83 : 
GGGAATTCAA AAAAACGAAA AAAACG 

(2) INFORMATION FOR SEQ ID NO: 84: 
(i) SEQUENCE CHARACTERISTICS: 
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(A) LENGTH: 2 4 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO : 84 : 
CCCCTCGAGT TAATAGGCAA ACAC 2 4 
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What is claimed is: 



1 . An isolated polynucleotide that encodes: 

(i) a polypeptide comprising an amino acid sequence that is homologous 
to the amino acid sequence of a Helicobacter membrane-associated 
5 polypeptide, wherein said amino acid sequence of said Helicobacter 

membrane-associated polypeptide is selected from the group consisting of the 
amino acid sequences as shown: 

-in SEQ ID NO:2, beginning with an amino acid in any one of positions 
-19 to 5, preferably in position -19 or position 1, and ending with an amino acid 
10 in position 689 (GHPO 3 86); 

-in SEQ ID NO:4, beginning with an amino acid in any one of positions 
-20 to 5, preferably in position -20 or position 1, and ending with an amino acid 
in position 713 (GHPO 789); 

-in SEQ ID NO:6, beginning with an amino acid in any one of positions 
15 -20 to 5, preferably in position -20 or position 1 , and ending with an amino acid 
in position 725 (GHPO 1516); 

-in SEQ ID NO:8, beginning with an amino acid in any one of positions 
-20 to 5, preferably in position -20 or position 1, and ending with an amino acid 
in position 691 (GHPO 1 197); 
20 -in SEQ ID NO: 1 0, beginning with an amino acid in any one of positions 

-20 to 5, preferably in position -20 or position 1, and ending with an amino acid 
in position 652 (GHPO 1180); 

-in SEQ ID NO: 12, beginning with an amino acid in any one of positions 
-18 to 5, preferably in position -18 or position 1, and ending with an amino acid 
25 in position 673 (GHPO 896); 

-in SEQ ID NO: 14, beginning with an amino acid in any one of positions 
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-21 to 5, preferably in position -21 or position 1, and ending with an amino acid 
in position 619 (GHPO 711); 

-in SEQ ID NO: 16, beginning with an amino acid in any one of positions 
-17 to 5, preferably in position -17 or position 1, and ending with an amino acid 
5 in position 635 (GHPO 190); 

-in SEQ ID NO: 1 8, beginning with an amino acid in any one of positions 
-19 to 5, preferably in position -19 or position 1, and ending with an amino acid 
in position 626 (GHPO 185); 

-in SEQ ID NO:20, beginning with an amino acid in any one of positions 
10 -16 to 5, preferably in position -16 or position 1, and ending with an amino acid 
in position 467 (GHPO 1417); 

-in SEQ ID NO:22, beginning with an amino acid in any one of positions 
-1 8 to 5, preferably in position -18 or position 1, and ending with an amino acid 
in position 673 (GHPO 1414); 
15 - in SEQ ID NO:66, beginning with an amino acid in any one of the 

positions from -20 to 5, preferably in position -20 or position 1, and ending 
with an amino acid in position 279 (GHPO 1360); and 

- in SEQ ID NO:68, beginning with an amino acid in position 1 and 
ending with an amino acid in position 399 (GHPO 750); or 
20 (ii) a derivative of the polypeptide. 

2. An isolated polynucleotide that encodes: 

(i) a polypeptide comprising an amino acid sequence that is homologous 
to an amino acid sequence selected from the group consisting of the amino acid 
sequences as shown: 

25 -in SEQ ID NO:2, beginning with amino acid in position -19 and ending 

with an amino acid in position 689 (GHPO 386); 
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-in SEQ ID NO:4, beginning with an amino acid in position -20 and 
ending with an amino acid in position 713 (GHPO 789); 

-in SEQ ID NO:6, beginning with an amino acid in position -20 and 
ending with an amino acid in position 725 (GHPO 1516); 
5 -in SEQ ID NO:8, beginning with an amino acid in position -20 and 

ending with an amino acid in position 691 (GHPO 1 197); 

-in SEQ ID NO: 10, beginning with an amino acid in position -20 and 
ending with an amino acid in position 652 (GHPO 1 180); 

-in SEQ ID NO:12, beginning with an amino acid in position -18 and 
10 ending with an amino acid in position 673 (GHPO 896); 

-in SEQ ID NO: 14, beginning with an amino acid in position -21 and 
ending with an amino acid in position 619 (GHPO 711); 

-in SEQ ID NO: 16, beginning with an amino acid in position -17 and 
ending with an amino acid in position 635 (GHPO 190); 
15 -in SEQ ID NO:18, beginning with an amino acid in position -19 and 

ending with an amino acid in position 626 (GHPO 185); 

-in SEQ ID NO:20, beginning with an amino acid in position -16 and 
ending with an amino acid in position 467 (GHPO 1417); 

-in SEQ ID NO:22, beginning with an amino acid in position -18 and 
20 ending with an amino acid in position 673 (GHPO 1414); 

- in SEQ ID NO:66, beginning with an amino acid in position -20 and 
ending with an amino acid in position 279 (GHPO 1360); and 

- in SEQ ID NO:68, beginning with an amino acid in position 1 and 
ending with an amino acid in position 399 (GHPO 750); or 

25 (ii) a derivative of the polypeptide. 
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3. The isolated polynucleotide of claim 1, which encodes the mature 
form of: 

(i) a polypeptide comprising an amino acid sequence that is homologous 
to an amino acid sequence selected from the group consisting of the amino acid 
5 sequences as shown: 

-in SEQ ID NO:2, beginning with an amino acid in any one of positions 
-19 to 5, preferably in position -19 or position 1, and ending with an amino acid 
in position 689 (GHPO 386); 

-in SEQ ID NO:4, beginning with an amino acid in any one of positions 
10 -20 to 5, preferably in position -20 or position 1 , and ending with an amino acid 
in position 713 (GHPO 789); 

-in SEQ ID NO:6, beginning with an amino acid in any one of positions 
-20 to 5, preferably in position -20 or position 1, and ending with an amino acid 
in position 725 (GHPO 1516); 
15 -in SEQ ID NO:8, beginning with an amino acid in any one of positions 

-20 to 5, preferably in position -20 or position 1, and ending with an amino acid 
in position 691 (GHPO 1 197); 

-in SEQ ID NO: 10, beginning with an amino acid in any one of positions 
-20 to 5, preferably in position -20 or position 1, and ending with an amino acid 
20 in position 652 (GHPO 1 1 80); 

-in SEQ ID NO: 12, beginning with an amino acid in any one of positions 
-1 8 to 5, preferably in position -1 8 or position 1, and ending with an amino acid 
in position 673 (GHPO 896); 

-in SEQ ID NO: 14, beginning with an amino acid in any one of positions 
25 -21 to 5, preferably in position -21 or position 1, and ending with an amino acid 
in position 619 (GHPO 711); 

-in SEQ ID NO: 16, beginning with an amino acid in any one of positions 



BNSDOCID: <WO 9843479A1J_> 



WO 98/43479 



PCT/US98/06421 



-17 to 5, preferably in position -17 or position 1, and ending with an amino acid 
in position 635 (GHPO 190); 

-in SEQ ID NO: 18, beginning with an amino acid in any one of positions 
-19 to 5, preferably in position -19 or position 1, and ending with an amino acid 
5 in position 626 (GHPO 1 85); 

-in SEQ ID NO:20, beginning with an amino acid in any one of positions 
-16 to 5, preferably in position -16 or position 1, and ending with an amino acid 
in position 467 (GHPO 1417); 

-in SEQ ID NO:22, beginning with an amino acid in any one of positions 
10 -18 to 5, preferably in position -18 or position 1, and ending with an amino acid 
in position 673 (GHPO 1414); 

- in SEQ ID NO: 66, beginning with an amino acid in any one of 
positions 

-20 to 5, preferably in position -20 or position 1, and ending with an amino acid 
15 in position 279 (GHPO 1360); and 

- in SEQ ID NO:68, beginning with an amino acid in position 1 and 
ending with an amino acid in position 399 (GHPO 750); or 

(ii) a derivative of the polypeptide. 

4. The isolated polynucleotide of claim 1, 2, or 3, wherein the 
20 polynucleotide is a DNA molecule. 

5. The isolated polynucleotide of claim 1, which is a DNA molecule 
that can be amplified and/or cloned by polymerase chain reaction from an 
Helicobacter genome, using either: 
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- a 5' oligonucleotide primer having a sequence as shown in SEQ ID 
NO:23, and a 3 1 oligonucleotide primer having a sequence as shown in SEQ ID 
NO:25 (unprocessed GHPO 386); 

- a 5 ! oligonucleotide primer having a sequence as shown in SEQ ID 

5 NO:26, and a 3' oligonucleotide primer having a sequence as shown in SEQ ID 
NO:28 (unprocessed GHPO 789); 

- a 5 ! oligonucleotide primer having a sequence as shown in SEQ ID 
NO:29, and a 3* oligonucleotide primer having a sequence as shown in SEQ ID 
NO:31 (unprocessed GHPO 1516); 

10 - a 5* oligonucleotide primer having a sequence as shown in SEQ ID 

NO:32, and a 3' oligonucleotide primer having a sequence as shown in SEQ ID 
NO:34 (unprocessed GHPO 1 197); 

- a 5 ! oligonucleotide primer having a sequence as shown in SEQ ID 
NO:35, and a 3' oligonucleotide primer having a sequence as shown in SEQ ID 

15 NO:37 (unprocessed GHPO 1 180); 

- a 5' oligonucleotide primer having a sequence as shown in SEQ ID 
NO:38, and a 3' oligonucleotide primer having a sequence as shown in SEQ ID 
NO:40 (unprocessed GHPO 896); 

- a 5 1 oligonucleotide primer having a sequence as shown in SEQ ID 

20 NO:41 , and a 3' oligonucleotide primer having a sequence as shown in SEQ ID 
NO:43 (unprocessed GHPO 711); 

- a 5 1 oligonucleotide primer having a sequence as shown in SEQ ID 
NO:44, and a 3' oligonucleotide primer having a sequence as shown in SEQ ID 
NO :46 (unprocessed GHPO 190); 

25 - a 5* oligonucleotide primer having a sequence as shown in SEQ ID 

NO:47, and a 3 ! oligonucleotide primer having a sequence as shown in SEQ ID 
NO:49 (unprocessed GHPO 1 85); 



BNSDOCID: <WO 9843479A1_I_> 



WO 98/43479 164 PCT/US98/06421 

- a 5' oligonucleotide primer having a sequence as shown in SEQ ID 
NO:50, and a 3 ! oligonucleotide primer having a sequence as shown in SEQ ID 
NO:52 (unprocessed GHPO 1417); 

- a 5' oligonucleotide primer having a sequence as shown in SEQ ID 

5 NO: 53, and a 3' oligonucleotide primer having a sequence as shown in SEQ ID 
NO:55 (unprocessed GHPO 1414); 

- a 5 1 oligonucleotide primer comprising a sequence as shown in SEQ ID 
NO:78 and a 3 ! oligonucleotide primer comprising a sequence as shown in SEQ 
ID NO:79 (unprocessed GHPO 1360); 

10 - a 5 ! oligonucleotide primer having a sequence as shown in SEQ ID 

NO:24, and a 3 f oligonucleotide primer having a sequence as shown in SEQ ID 
NO:25 (mature GHPO 386); 

- a 5' oligonucleotide primer having a sequence as shown in SEQ ID 
NO:27, and a 3 1 oligonucleotide primer having a sequence as shown in SEQ ID 

1 5 NO:28 (mature GHPO 789); 

- a 5' oligonucleotide primer having a sequence as shown in SEQ ID 
NO:30, and a 3' oligonucleotide primer having a sequence as shown in SEQ ID 
NO:31 (mature GHPO 1516); 

- a 5' oligonucleotide primer having a sequence as shown in SEQ ID 

20 NO:33, and a 3' oligonucleotide primer having a sequence as shown in SEQ ID 
NO:34 (mature GHPO 1 197); 

- a 5' oligonucleotide primer having a sequence as shown in SEQ ID 
NO:36, and a 3' oligonucleotide primer having a sequence as shown in SEQ ID 
NO:37 (mature GHPO 1 180); 

25 - a 5 ! oligonucleotide primer having a sequence as shown in SEQ ID 

NO:39, and a 3' oligonucleotide primer having a sequence as shown in SEQ ID 
NO:40 (mature GHPO 896); 
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- a 5' oligonucleotide primer having a sequence as shown in SEQ ID 
NO:42, and a 3 ! oligonucleotide primer having a sequence as shown in SEQ ID 
NO:43 (mature GHPO 71 1); 

- a 5' oligonucleotide primer having a sequence as shown in SEQ ID 

5 NO:45, and a 3 1 oligonucleotide primer having a sequence as shown in SEQ ID 
NO:46 (mature GHPO 190); 

- a 5' oligonucleotide primer having a sequence as shown in SEQ ID 
NO:48, and a 3' oligonucleotide primer having a sequence as shown in SEQ ID 
NO:49 (mature GHPO 1 85); 

10 - a 5' oligonucleotide primer having a sequence as shown in SEQ ID 

NO:51, and a 3 1 oligonucleotide primer having a sequence as shown in SEQ ID 
NO:52 (mature GHPO 1417); 

- a 5 f oligonucleotide primer having a sequence as shown in SEQ ID 
NO: 54, and a 3 ! oligonucleotide primer having a sequence as shown in SEQ ID 

15 NO:55 (mature GHPO 1414); 

- a 5 f oligonucleotide primer comprising a sequence as shown in SEQ ID 
NO: 80 and a 3' oligonucleotide primer having a sequence as shown in SEQ ID 
NO:81 (GHPO 750); or 

- a 5 ! oligonucleotide primer comprising a sequence as shown in SEQ ID 
20 NO: 82 and a 3' oligonucleotide primer having a sequence as shown in SEQ ID 

NO:79 (mature GHPO 1360). 



6. The isolated DNA molecule of claim 5, which can be amplified 
and/or cloned by the polymerase chain reaction from a Helicobacter pylori 
genome. 
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7. The isolated polynucleotide of claim 1, which is a DNA molecule 
that encodes the mature form or a derivative of a polypeptide encoded by the 
DNA molecule of claim 5. 

8. The isolated polynucleotide of claim 1, which is a DNA molecule 
5 that encodes the mature form or a derivative of a polypeptide encoded by the 

DNA molecule of claim 6. 



9. A compound, in a substantially purified form, that is the mature form 
or a derivative of a polypeptide comprising an amino acid sequence that is 
homologous to an amino acid sequence of a polypeptide associated with the 
1 0 Helicobacter membrane, which is selected from the group consisting of the 
amino acid sequences as shown: 

-in SEQ ID NO:2, beginning with amino acid in position -19 and ending 
with an amino acid in position 689 (GHPO 386); 

-in SEQ ID NO:4, beginning with an amino acid in position -20 and 
15 ending with an amino acid in position 713 (GHPO 789); 

-in SEQ ID NO: 6, beginning with an amino acid in position -20 and 
ending with an amino acid in position 725 (GHPO 1516); 

-in SEQ ID NO:8, beginning with an amino acid in position -20 and 
ending with an amino acid in position 691 (GHPO 1 197); 
20 -in SEQ ID NO: 10, beginning with an amino acid in position -20 and 

ending with an amino acid in position 652 (GHPO 1 180); 

-in SEQ ID NO: 12, beginning with an amino acid in position -18 and 
ending with an amino acid in position 673 (GHPO 896); 

-in SEQ ID NO: 14, beginning with an amino acid in position -21 and 
25 ending with an amino acid in position 619 (GHPO 71 1); 



BNSDOCID: <WO 9843479A1_L> 



WO 98/43479 



PCT/US98/06421 



-in SEQ ID NO: 16, beginning with an amino acid in position -17 and 
ending with an amino acid in position 635 (GHPO 190); 

-in SEQ ID NO: 1 8, beginning with an amino acid in position -19 and 
ending with an amino acid in position 626 (GHPO 185); 

-in SEQ ID NO:20, beginning with an amino acid in position -16 and 
ending with an amino acid in position 467 (GHPO 1417); 

-in SEQ ID NO:22, beginning with an amino acid in position -18 and 
ending with an amino acid in position 673 (GHPO 1414); 

- in SEQ ID NO:66, beginning with an amino acid in position -20 and 
ending with an amino acid in position 279 (GHPO 1360); and 

- in SEQ ID NO:68, beginning with an amino acid in position 1 and 
ending with an amino acid in position 399 (GHPO 750); or 

(ii) a derivative of said polypeptide. 

15 10. The compound of claim 9, which is the mature form or a derivative 

of a polypeptide encoded by a DNA molecule of claim 5. 

1 1 . The compound of claim 9, which is the mature form or a derivative 
of a polypeptide encoded by a DNA molecule of claim 6. 

12. A pharmaceutical composition for preventing or treating 
20 Helicobacter infection in a mammal, said composition comprising a 

prophylactically or therapeutically effective amount of a compound of claim 9, 
10, or 11 and a pharmaceutically acceptable diluent or carrier. 

13. The composition of claim 12, further comprising an antibiotic, an 
antisecretory agent, a bismuth salt, or a combination thereof. 



5 



10 
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14. The composition of claim 13, wherein said antibiotic is selected 
from the group consisting of amoxicillin, clarithromycin, tetracycline, 
metronidizole, and erythromycin. 

15. The composition of claim 13, wherein said bismuth salt is selected 
5 from the group consisting of bismuth subcitrate and bismuth subsalicylate. 

16. The composition of claim 13, wherein said antisecretory agent is a 
proton pump inhibitor. 

17. The composition of claim 16, wherein said proton pump inhibitor is 
selected from the group consisting of omeprazole, lansoprazole, and 

10 pantoprazole. 

1 8. The composition of claim 13, wherein said antisecretory agent is an 
H 2 -receptor antagonist. 

19. The composition of claim 1 8, wherein said H 2 -receptor antagonist is 
selected from the group consisting of ranitidine, cimetidine, famotidine, 

15 nizatidine, and roxatidine. 

20. The composition of claim 13, wherein said antisecretory agent is a 
prostaglandin analog. 

21 . The composition of claim 20, wherein said prostaglandin analog is 
misoprostil or enprostil. 
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22. The composition of claim 12, which further comprises a 
prophylactically or therapeutically effective amount of a second Helicobacter 
polypeptide or a derivative thereof. 

23. The composition of claim 22, wherein the second Helicobacter 
5 polypeptide is a Helicobacter urease, a subunit, or a derivative thereof. 



24. The composition of claim 12, further comprising an adjuvant. 

25. A pharmaceutical composition for preventing or treating 
Helicobacter infection in a mammal, said composition comprising a 
prophylactically or therapeutically effective amount of a polynucleotide of 

10 claim 1 , 2, or 3 and a pharmaceutical^ acceptable carrier or diluent. 



26. A pharmaceutical composition for preventing or treating 
Helicobacter infection in a mammal, said composition comprising a 
prophylactically or therapeutically effective amount of a polynucleotide of 
claim 5, 6, or 7 and a pharmaceutical^ acceptable carrier or diluent. 



15 27. A pharmaceutical composition for preventing or treating 

Helicobacter infection in a mammal, said composition comprising a 
prophylactically or therapeutically effective amount of a polynucleotide of 
claim 8 and a pharmaceutical^ acceptable carrier or diluent. 

28. A composition comprising a viral vector, in the genome of which is 
20 inserted a DNA molecule of claim 4, said DNA molecule being placed under 
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conditions for expression in a mammalian cell and said viral vector being 
admixed with a physiologically acceptable diluent or carrier. 

29. The composition of claim 28, wherein said viral vector is a 
poxvirus. 

5 30. A composition that comprises a bacterial vector comprising a DNA 

molecule of claim 4, said DNA molecule being placed under conditions for 
expression and said bacterial vector being admixed with a physiologically 
acceptable diluent or carrier. 

3 1 . The composition of claim 30, wherein said vector is selected from 
10 the group consisting of Shigella, Salmonella, Vibrio cholerae, Lactobacillus, 

Bacille bilie de Calmette-Guerin, and Streptococcus. 

32. The composition of claim 25, wherein said polynucleotide is a DNA 
molecule that is inserted in a plasmid that is unable to replicate and to 
substantially integrate in a mammalian genome and is placed under conditions 

15 for expression in a mammalian cell. 

33. An expression cassette comprising a DNA molecule of claim 4, said 
DNA molecule being placed under conditions for expression in a procaryotic or 
eucaryotic cell. 

34. A process for producing a compound of claim 9, which comprises 
20 culturing a procaryotic or eucaryotic cell transformed or transfected with an 
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expression cassette of claim 33, and recovering said compound from the cell 
culture. 

35. A pharmaceutical composition for preventing or treating 
Helicobacter infection in a mammal, said composition comprising a 
5 prophylactically or therapeutically effective amount of an antibody that binds to 
the compound of claim 9, 10, or 1 1 and a pharmaceutically acceptable carrier 
or diluent. 
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MKK L-L L . . . L . . . L . AEDDGFYTSVGYQIGEAAQMV . NTKGIQ . LS GHPO 386 
MKKHILSLALGSLLVSTLSAEDDGFYTSVGYQIGEAAQMVTNTKGIQQLS GHPO 789 
MKKHILSLALGSLLVSTLSAEDDGFYTSVGYQIGEAAQMVTNTKGIQQLS GHPO 1516 

MKK ..L.L.L...L...L. AEDDGFYTSVGYQIGEAAQMV .NTKGIQ . LS Consensus 

DNYE.LNNLL. . YSTLNTLIKLSADPSAIN . .R.NLG.S. .NL. . .K.NS GHPO 386 
DNYENLNNLLTRYSTLNTLIKLSADPSAINAVRENLGAS . KNLIGDKANS GHPO 789 
DNYENLMNLLTRYSTLNTLIKLSADPSAINAVRENLGAS . KNLIGDKANS GHPO 1516 

DNYE . LNNLL . . YSTLNTLIKLSADPSAIN . . R . NLG .S . .NL. . .K.NS Consensus 

PAYQAVLLA . NAAVGLW . V . . YA . T . CG- . G G...FNN.PGQD GHPO 386 

PAYQAVLLAINAAVG .WNV.GY. -T.CG.N.NG.ES IFNN . PG . . GHPO 789 

PAYQAV . LAINAAVGLWN . . GYA- . . CG-NGNG . ES . . G . . IFN . . PGQD GHPO 1516 

PAYQAV . LA . NAAVG . W Y CG FN . . PG . . Consensus 

T ITCN- PG.GGP.S. .N. .K.N.AYQIIQ.AL. . . G.N GHPO 386 

ST ITC PG . GPMSI . NFKKLNEAYQILQ . ALKN — G . P . L . . N GHPO 789 
ST. ITCN- G.G. .MSI. . FKKLNEAYQI . Q . ALKN . .G.P.LG.N GHPO 1516 

.T.ITC G S K.N.AYQI.Q.AL N Consensus 



G. .V.V. .N.T ING. K. .G.K..T S I, 

. .KVSV.Y.YTC .G C G.K. .T S.TT.I 

G.KVSV.YNY.C ING C. .K TT . . 



.V.V. 



TO TI T ...... .NNAQ . LL . QAS . II . TLNEACP . F . 

I T.K.D AQ.LL.QAS. . I . T .NEACP . F . 

TQ. . .TI.T T.K.D-. .NNA. . LL . QA . . I . . . LN . . CP . . . 

I A. .LL.QA N. .CP. . . 



GHPO 386 
GHPO 789 
GHPO 1516 

Consensus 

GHPO 386 
GHPO 789 
GHPO 1516 

Consensus 



GG. . .W.G.S. .G. .CG.F. .EISAIQ.MI .NAQE.VAQSKIV GHPO 386 

TN .T.G. .CG.F. .EISAIQ.MI. .AQE.V.Q GHPO 789 

TN! '. '. '. .GG. . .W-G.ST.G. .C. .F. .E.S MI . NAQE . . AQSKIV GHPO 1516 

G..C..F..E.S MI. .AQE. . .Q. . . . Consensus 



322 SENAQNQN-NLDTGKPFNPYTDASFAQSMLKNAQAQAE . LN . AEQV . KN . GHPO 386 

338 • n Q GKPFNP . TDASFAQ . ML . NA . AQA . MLNLA . QV GHPO 789 

346 SENAQNQN-NLDTGKPFNPYTDASFAQSMLKNAQAQAEM . NL . EQV . KN .. GHPO 1516 

. . N . Q GKPFNP . TDASFAQ .ML.NA. AQA . . . N . . . QV Consensus 

Alignment of three predicted polypeptides from H. pylori that share exact identity at their N-terminus 
(underlined) with the N-terminal amino acid sequence of the mature native 76 kDa protective 
antigen A consensus sequence is indicated in bold. The amino acid sequence of GHPO 386 shares 
62% identity in a 733 aa overlap with GHPO 789 and 70% identity in a 745 aa overlap with GHPO 
1516. Amino acid positions are numbered to the left of the alignment. 
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RYSELGNTYNSITTALSKVPNAQSLQNWSKKNNPYSPQGIETNYYLNQN Consensus 

500 SYNQIQTINQELGRNPFRKVGIVNSQTNNGAMNGIGIQVGYKQFFGQKRK GHPO 386 
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SYNQIQTINQELGRNPFRKVGIVNSQTNNGAMNGIGIQVGYKQFFGQKRK Consensus 

550 WGARYYGFFDYNHAFIKSSFFNSASDVWTYGFGADALYNFINDKATNFLG GHPO 386 
575 WGARYYGFFDYNHAFIKSSFFNSASDVWTYGFGADALYNFINDKATNFLG GHPO 789 
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WGARYYGF F DYNHAF I KS S F FNS AS DVWT YGF GADAL YNF I NDKATNFLG Consensus 
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650 GVRMNLARSKKKGSDHAAQHGIELGLKIPTINTNYYSFMGAELKYRRLYS GHPO 386 
675 GVRMNLARSKKKGSDHAAQHGIELGLKIPTINTNYYSFMGAELKYRRLYS GHPO 789 
687 GVRMNLARSKKKGSDHAAQHGIELGLKIPTINTNYYSFMGAELKYRRLYS GHPO 1516 

GVRMNIiARSKKKGSDHAAQHGIELGLKI PTINTNYYSFMGAELKYRRLYS Consensus 

700 VYLNYVFAY GHPO 386 
725 VYLNYVFAY GHPO 789 
737 VYLNXVFAY GHPO 1516 

VYLNYVFAY Consensus 
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Alignment of eight related polypeptides from H. pylori that share significant identity with the consensus 
sequence determined for the 76 kDa family, a member of which has been determined to be protective 
in animal models. The amino acid sequence from GHPO 386 shares 53% identity in a 672 aa overlap 
with GHPO 1180 51% identity in a 691 aa overlap with GHPO 896, 51% identity in a 691 aa overlap with 
GHPO 1414, 63% identity in a 711 aa overlap with GHPO 1197, 44% identity in a 640 aa overlap with 
GHPO 711, 37% identity in a 645 aa overlap with GHPO 185, 36% identity in a 652 aa overlap with 
GHPO 190 and 41% identity in a 483 aa overlap with GHPO 1417. Amino acid positions are numbered 
to the left of the alignment. Gaps (-) have been introduced to maximize alignment. Absolute identity 
shown only (all other residues identified by a dot). 
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